


Vol. 1931 


THE ANNALS 


of 


MATHEMATICAL 
STATISTICS 


(Printed in U. S. A.) 


Copyright 1931 


i cH” 





5 ’ 


Published and Lithoprinted by 
EDWARDS BROTHERS, INC. 
ANN ARBOR, MICH. 








Math.-Econ. 
Library 











EDITORIAL COMMITTEE 


H. C. Carver, Editor 
J. W. Edwards, Business Manager 


A quarterly publication of the American Statistical Association, 
devoted to the theory and application of Mathematical Statistics. 


Six dollars per annum. 


Reprints of any article in this volume may be obtained at any time 
from the Editor at the following rates, postage included. 


Number of Copies Cost per Page 


1- 4 ‘ : 2 cents 

5-24 ’ ; 1% cents 
SOandover . % cent 
25-49 5 5 1 cent 


Appress: Editor, Annals of Mathematical Statistics 
Post Office Box 171, Ann Arbor, Michigan 





GRADES race hake hE BN ES 





Sa lke ad NCHS E Ls nascck tt 2m 














THE ANNALS OF 
MATHEMATICAL STATISTICS 


VOL. II - _ FEBRUARY NO. 1 











CONTENTS 


Tue RELATION BETWEEN STABILITY AND HOMOGENEITY .. 1 
By L. v. Bortkiewics 


ON SS ‘ 23 
By E. C. Molina 


On CERTAIN PROPERTIES OF FREQUENCY DISTRIBUTIONS OB- 
TAINED BY A LINEAR FRACTIONAL TRANSFORMATION OF 
THE VARIATES OF A GIVEN DISTRIBUTION - ... . 38 


By H. L. Rietz 


On SMALL SAMPLES FROM CERTAIN Non-NorMAL Universes 48 
By Paul R. Rider 


An EmpIrRicAL DETERMINATION OF THE DISTRIBUTION OF 
MEANS, STANDARD DEVIATIONS, AND CORRELATION Co- 
EFFICIENTS DRAWN FROM RECTANGULAR POPULATIONS 66 


By Hilda Frost Dunlap 


THE INTERDEPENDENCE OF SAMPLING AND FREQUENCY Dis- 
TRIBUTION THEORY - ..... 82 


Editorial 


Note ON THE DIsTRIBUTION OF MEANS OF SAMPLES OF WN 
Drawn FROM A Type A PopuULATION 


By Cecil C. Craig 








— —- Oe 


PUBLISHED QUARTERLY BY 
AMERICAN STATISTICAL ASSSOCIATION © 


Publication Office—Edwards Brothers, Inc., Ann Arbor, Mich. 
Edstorial Office—University of Michigan, Ann Arbor, Mich, 


Application for entry as second class matter pending 











EDITORIAL COMMITTEE 





C. Carver, Editor 
L. O’Toole, Assistant Editor 


H. 
A. 
J. W. Edwards, Business Manager 





A quarterly publication sponsored by the American Statistical Association, 
devoted to the theory and application of Mathematical Statistics. 


Six dollars per annum 
€ 





f 

k 

Reprints of any article in this issue may be obtained at any time 
from the Editor at the following rates, postage included. ‘ 
& 

Number of copies Cost per page ' 

1- 4 ‘ ; 2 scents ' 

5-24 : * 1% cents f 

25-49 . . J cent ' 

SOandover . 3% cent ; 


PR TES 


ApprEss: Editor. Annals of Mathematical Statistics 
Post Office Box 171, Ann Arbor, Michigan 





MS RA 


FREEONES 


TET 


THE RELATIONS BETWEEN STABILITY 


AND HOMOGENEITY* 


By 


L. v. BortTKreEwIcz 


_ The idea of investigating the stability of statistical frequen- 
cies from the standpoint of the theory of probability goes back 
to the French mathematician Bienaymé. From various examples 
taken from social and moral statistics, he was the first to estab- 
lish the fact that, almost without exception, the stability in ques- 
tion was essentially less than the “classical norm,” that is, less 
than the expectation which is associated with the classical scheme 
of independent trials with a constant underlying probability. In 
order to explain this discrepancy between theory and observa- 
tion, Bienaymé used a modification of the traditional procedure 
which was characterized by the assumption that between neigh- 
boring trials in a time ordered sequence a sort of dependence 
existed. Though interesting in itself and among other things 
adopted by Cournot as his own, we shall replace this method in 
what follows by another, originating from Lexis, which has the 
advantage of a wider usefuiness, in that it can be applied not only 


*Translated by A. R. Crathorne. [Sead before the American Statistical As- 
suciation at Cleveland, OQhro, December 30, 1920. 














2 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


to undulatory but to evolutory sequences.! 
Let us assume that for a series of 2 successive time inter- 
vals, say years, we have found that some event (accident, death, 


marriage, crime) has happened x, %,, . . times, 
and that the corresponding ease of “trials,” that j is the num- 
bers of persons observed, are Sj, Sg, + + + + 0 that the 
quotients y,= $ > Yp* 3 ‘4 ; represent 


a time ordered sequence of eilative agente. toned of as- 
suming, as the traditional theory demands, that each term yy, of 
this series corresponded to a common fundamental probability ~ . 
weighted with accidental errors, Lexis assumed that each value 
Y, Was associated with a distinct probability p, . 

As a result of this, the expected amplitude of the fluctua- 
tions of the values Se increased, and the greater the varia- 
tions in the ,’s the greater the amplitude. Under the sim- 
plifying hypothesis S, = const. ( = s), the corresponding 
standard deviation o is defined by 


z= 
o*«L? (y-y)* ys 
BL YY vb Zu 
For the case of a constant p we may write 
(1) Ela 2. Ble-d 


where E denotes “expectation.” In the Lexis procedure with 
a variable , , using the notation 


zl. PLE P) sys 2 Pe *Pi Py P* tb Z “wr 


‘Bienayme, in the journal “L’Institute,” Vol. 7 (1831), pages 187-189, and 
in “Journal de la Societe de Statistique de Paris,” 17e (1876), pages 199-204. 
A. Cournot, Exposition de la theorie des chances et des probabilities, Paris, 
1843, Nos. 79 and 117. 

W. Lexis, “Uber die Theorie der Stabilitat statistischer Reihen,” in the 
Jahrbuch fur Nationalokonomie und Statistik, Vol. 32 (1879), pages 60 . ., 
reprinted in Abhandlungen zur Theorie der Bevilkesungs und Moralstat- 
istik, Jena, 1903, pages 170-212. 





L. v. BORTKIEWICZ 3 


the corresponding relation 


(2) E (07) = u%+ =F"! we 
can be derived.’ 

In the following numerical examples the numbers of observa- 
tions S, are never less than some ten thousands, while 2 = 10. 
Hence, as far as these and similar examples are concerned, the 


numerical results are not appreciably altered if, instead of (2), 
we use 


(3) E (o®) = u*#+w4 


However, a certain inaccuracy arises, if, in the application 
of formula (3) to the raw data, one has disregarded the funda- 
mental assumption that $j is constant and in the expression for 
u* has replaced s by the arithmetic mean of the 2 values 
S, - lf, however, the latter differ little from one another, such 
@ procedure gives rise to no great discrepancy. Lexis called the 
quantities u and w in formula (3) the two “fluctuation com-. 
ponents,” which combine (according to the law of composition of 
forces) to give the expected total! fluctuation. The quantity u 
gives expression to the effect of the “accidental causes” in the 
sense of the theory of probability, and this effect grows less and 
less with increasing $s until it vanishes for $+ co . For this 
reason Lexis called « the normal component. He also used 
the term “unessential fluctuation componcnt.” On the other hand, 
«@ depends on the variations of the fundamental probability, that 
is on the underlying general conditions, and in this sense was 
designated by Lexis as the physical compouent. We may also 


*One does not find formula (2) in Lexis’s work. He was satisfied at this 
point with a rather inexact method yielcing an approximete resu!t. How- 
ever, this did not affect the essential part of his diseussion. 








4 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


call it the essential component. 

The first of the two components « and «J can be easily 
calculated directly with sufficient approximation. The usual 
method is to substitute for the unknown £ in the expression for 
u* the value y , the arithmetic mean of the frequencies y, , 
taining 


(4) ur, 2! ° wl-Y) 


As for the second component w , it is calculated by the in- 
direct method of substituting o* for E ( o#) in (3) and then 
w is found from gq #®=o#-«? . This method, however, 
assumes that o >«z , or what is the same thing, that the dis- 
persion coefficient, Q = & , is greater than 1. In his older papers, 
Lexis distinguished between subnormal, normal and supernormal 
dispersion, according to whether @ was distinctly less than 1, 
approximately equal to 1, or distinctly greater than 1, and found 
that in social and moral statistics the subnormal dispersion never 
occurred and the normal rarely. Supernormal dispersion was the 
rule. So Lexis based his scheme of a varying underlying prob- 
ability on the case of supernormal dispersion. In fact, from 
formula (3), we have 


(5) E (Q%=1+(3¥ 


which says that the variations in the underlying probability lead 
us to expect values of @ greater than unity.' 

Notwithstanding the fact that @ was usually greater than 
unity, Lexis did not consider this a proof that his scheme ade- 


"Under the influence of accidental causes, Q may be less than unity not 
only for constant, but also for varying underlying probabilities, and this 
circumstance must be considered in the determination of @. It would 
carry us too far afield to go further into this matter. 








L. v. BORTKIEWICZ 5 


quately described the actual facts. In addition to this he was more 
concerned with the fact that in experience Q showed a tendency 
to decrease with decreasing number of “trials,” that is with de- 
creasing S . Indeed, in a series of examples, Lexis had shown 
that a value of Q which was decidedly greater than unity when 
calculated for an entire country, decreased to nearly 1 when the 
data for the single administration districts of the same country 
were used. Lexis considered such behavior of Q as entirely in 
harmony with his scheme. 
If we write formula (5) in the form 


2 ‘ 
(6) E(Q*)= Its ‘CdS ; 


we see that the excess of Q* over and above 1 is in expectation 
directly proportional to s . This was the explanation of the 
decrease of @ with decreasing s , for as Lexis said, we have 
no ground to expect that s being large or small had any bearing 
on the value of w . 

It is this last point about which the criticism of Lexis’s dis- 
persion theory centers. Notwithstanding the endeavors of Lexis 
to fit his theory to statistical reality, we can show that the facts 
were against him as far as his assumpticn that qa is funda- 
mentally independent of s is concerned. If this assumption were 
true, then formula (6) tells us distinctly how @ decreases with 
diminishing s . We learn from experience that as a rule this 
decrease in Q is less than that given by the formula; from which 
it follows that the essential component, « , has a tendency to 
increase with decreasing s . 

If we desire to investigate just what happens in reality, a 
certain cormplication arises, because we are never able to compare 
groups which differ among one another as to s , but not as to 
p (ory ). In order to eliminate to some extent the varia- 
tions of p we consider the ratio of @ to p. Let $ “fB, 








6 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


and call 4 the relative essential component to distinguish it from 
the absolute essential component @ . Formula (6) then becomes 
the following: 


2 
2). = 
” EQ) =1+ 30 CENT) 


The ne sp can be considered as the expected number 
of “successes.” For a constant s, (= S$ ) we have 


E (%4)+ SP,, E C4 2 xx) = sp 


and, letting S= 3 2. Sy » the last relation is true with sufficient 
approximation for a variable s, provided the variation is not 
too pronounced. Let sperm . Often, as in the exaraples 
which follow, s is so small that we can consider ( /-  ) as 
equal to 1. Formula (7) then becomes 


(8) E(Q?)- 1+ 25 ™B* 


The question as to whether there is a connection between 6 
and w is now changed to an investigation of the relationship 
between 77 and £3. In undertaking such an investigation em- 
pirically, we compare as to the behavior of m7 and £ a statis- 
tical aggregate considered as a total with its component parts 
considered as partial aggregates. Let the number of the partial 
aggregates be 77 , and let the corresponding values of r77 and 
f as wellas u, w and o be indicated by the subscript ¢, 
which can also serve as the ordinal number of the partial aggre- 
gate. For the total aggregate, let ¢=: QO . The symbols sy,y, 
Tike Jik > Pew > arethe $.2%.Y, Pp of the 
« th partial aggregate and the K th time interval. We also use 








L. v. BORTKIEWICZ 7 


the notation 
LF L = 
5; = S; X;-= A; 
i ze tk > iz Le, i,k» 
z , z 
Yr BD Yinr BEL Piz - 
k=/ kel < 
from which we have 
n 
S.= Ss; ZX, = <x; 
° anf a? o 424 4? 
” 
ait pail 
We" 5,L, 94 > P35, HPI 
= ° (zf 


We have also the following relations: 


z 
Mj= Si Pi > gj*-4 0 YK Y's 
ke/ 


z 
2. 2-/ P; (1- P. wield e® 
ufe Bgl AEE, wired the 


@),- 
w here Cpe Pik ~ Pes Pi BZ; > 


2)_,,2 2 -o 
E(a,') = 4; +@; , Qi ai’ 
Cck we have further 


and using the notation a oo 
/ «,& : 


2 
2; 2 
B-2 2 ek 








8 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


Finally, corresponding to formula (8), we have 


(9) E(Qi)+ 1+ 25 mi Bi 


We shall now apply these formulas to statistics on the fre- 
quency of suicides in Germany for the decade 1902-1911. The 
numbers of “trials,” s ik + are here the populations of the 
regions in question; the “successes,” oc; y, , are the numbers 
of suicides for each year. The relative f requencies, Yeu are 
found by dividing the numbers of suicides by the corresponding 
populations. Like various other kinds of social phenomena, the 
suicides in pre-war German statistics were grouped according to 
states, the provinces of Prussia, right Rhenish Bavaria and left 
Rhenish Bavaria being included as states. In this way we have 
forty territories of very unequal size. For the decade 1902-1911, 
the mean population of the territories ranged from a maximum 
of 6,587,000 (Rhine Province) to a minimum of 45,000 (Schaum- 
burg-Lippe). The maximum average number of suicides per 
annum was 1453 (Saxony) and the minimum 7 (Schaumburg- 
Lippe). Corresponding to the purpose of the investigation, these 
suicide figures ac; , which can be considered as approximations 
to #7, , were arranged in descending order, with oc, =1453 and 
X40 = 7. 

For the whole of Germany, we have w, = 13173, 
Y, 214:10* (that is an average number of 214 suicides per 
annum for each million population). The ten values Yow vary 
between 204-10 and 223-10. These fluctuations are markedly 
greater than one expects from the classical norm. The calcula- 
tion of the dispersion-quotient gives Q, «= 3.14, and, as the 
Lexis theory demands, is greater than any one of the 40 values 
of @;.1 These values give 2.03 as a maximum and 0.75 as a 
‘A study of suicides and of homicides in the United States yields much the 


same general results as those shown here for suicides in Germany. (Note 
by the translator.) 





—_———~ 











L. v. BORTKIEWICZ 9 


minimum, Fixing attention on the eight smallest values of 2; , 
we find an average value of 1.02 for Q; , and of the eight values, 
three are larger and five less than 1. So in this example the dis- 
persion becomes very nearly 1 by narrowing the observation field. 

But we have still to find out whether Q, decreases with 
2c, according to the measure of decrease that one would expect 
under the hypothesis that £3, is fundamentally independent of 
x; . To decide this question, we let (6, =const.-(  , in- 
cluding §,=6 , and substitute also 2; for 777; in formula 
(9). We have then on the one hand in expected values 


2 
= 


Q,- 1+ F x. 8° 
and on the other hand 
1 = pe z af 
he, vhis GB 
from which follows 
L * sol 2. 


However, in our example, we find 


n. ; 

/ 2 

‘ ‘ Q = 1.56, 1+£ (Q2-)=1.22 
and the difference 0.34 cannot be ascribed to chance for it is three 
times the probable error (the determination of which we cannot 
now take up). We must, then, assume that the average of the 


values B; ,for ¢=1 to 40, is greater than (, . Why this 
is so we shall see in the following discussion. 


We consider now the mutual relationship between the devia- 
tions €;, and €;,~ which refer to two arbitrary territories 
N; and N; , and we build up according to the formula for a 








10 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


correlation coefficient the expression 


z 
Y,,°4£ 2 Siw lik 
ws kt BB; 8; 
The number of combinations of the subscripts ¢ and / is 
(4-1) , so there are that many values ‘%,, j - Finally we 


construct a weighted arithmetic mean of these values according to 
the formula, 


a 


y é, Emm; 6; %.j 
s és josel 
LE mm b, 


dsl grat] 


The expression Y serves to characterize the mutual relation- 
ship of time ordered series of fundamental probabilities p;, , 
hence also of relative frequencies y,; 4 , which may be con- 
sidered as approximations to pj . If we give the name 
“syndromy” to such an array of simultaneously distinct fundamen- 
tal probabilities (or relative frequencies), we may call ¥Y a 
“coefficient of syndromy.” For Y= 1, we shall speak of “isod- 
romy,” for 1 > ¥ > 0, of “homodromy,” for ¥ =O, of “para- 
dromy,” and for y¥< 0, of “antidromy.” We may include the 
last three cases, namely >< 1, under the name “anisodromy.” 

With the help of 2 we can exhibit the relation between 
fp On the one hand and the 77 values 5, 2 or B&B, on 
the other hand as follows: ° 


(10) ms B= Bom, B; +7 {(Z m; B,)’- 2 mi Bit 


ssf 











L. v. BORTKIEWICZ 11 


Since mya m, ,wefind for 7=1, from (10) 
a=] 


(11) 43 § 1; &; 
é=1 mm 


and for Y<1 


n 


(12) 8.< Bi 


Lm 


fe] 

Hence, only in the case of isodromy is the assumption justi- 
fied that the relative essential fluctuation component for the total 
aggregate is as large as that for the partial aggregates. In every 
other case, namely for anisodromy, the relative essential com- 
ponent for the total aggregate falls below the level for the partial 
aggregates more and more as 7¥ becomes less and less. 

In the suicide example under consideration we have hom- 
odromy, which is reasonable, since the fluctuations in suicide fre- 
quency in the single states are influenced in part by factors which 
are not local but general for all Germany. Somewhat tedious 
calculations give Y= 0.38. At the same time we find 
/B. = 0.0246 approximately, while the average for G jr esl 
to 40 is 0.0392. 

If now we group the 40 states into five groups so that states 
numbered 1 to 8 form the first group, states numbered 9 to 16 
the second, and so on, we find as average values of (3, , 0.0354, 
0.0358, 0.0485, 0.0528 and 0.0767. The quantities (3; then show 
a tendency to increase as 2x, (or m, ) decreases. 

If, as in this example, the total aggregate is a “natural unit,” 
we should expect to have homodromy in the vast majority of 
cases. On the other hand, we should expect paradromy if the 
total aggregate is an “artificial unit,” that is, one made up by 











12 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


throwing together entirely unrelated groups. As an illustration 
of paradromy we take the array of marriage frequencies for the 
six cities, Barcelona, Birmingham, Boston, Leipzig, Melbourne 
and Rome, for the decade 1899-1908. By marriage frequence 
we mean the ratio of the number married (twice the number of 
marriages) to population. 

For the six cities taken as a whole, with a total population of 
about three million, the marriage frequence y of varies be- 
tween 18.00 and 19.02 per cent with an average of 18. 38 per cent. 
The dispersion coefficient @, is 3.17. For the six cities taken 
singly in the above order, each with a population of about half a 
million, the values of @; are 2.69, 4.32, 4.17, 2.88, 3.76 and 
2.72, with an average 3.42, somewhat higher than @, This 
result is a direct contradiction of the statement of Lexis that a 
narrowing field of observation reduces the value of @. Lexis, 
without giving the matter much thought, worked with the hy- 
pothesis that isodromy, or at least a decided homodromy, always 
existed. In our example, however, we have paradromy, if not 
antidromy, for we find ¥Y to be -0.054. Corresponding to this, 
we have 4, less than each of the values 4, to 4,, for 
approximates 0.0167 while ;, «¢ = 1 to 6, lies between 0.0334 
and 0.0563. The quadratic mean of these quantities is 0.0450. 

It is of prime interest to investigate for paradromy the theo- 
retical relation of £9, to the quadratic mean of the values 6, , 
Bb, -- - &, andof Q, to the quadratic mean of Q, , 
QO °° °Q,” for the case 777; = const. = 777. In this 


case, %%_:77777 , and if O is substituted for 7 in (10) we 
have 


(.° Pe B: , whence B.-tNEE BT 


At the same time, we find on the one hand, from (9), the ex- 








L. v. BORTKIEWICZ 13 


pected value 
2 z 2 
Q, It, m, 2, , 
or 
2 z m™m 2 
Qs I+ 52, 226, 
and on the other hand 


whence 


In the marriage frequence example, where the quantities 777, , 
though not equal, differ very little from one another, we have the 
values already found 


3, =00167 and = @ =3.17 
to compare with the values 


” 
ayn Z 6'= 0.0184 


and 
J 7 
2 
nA é Q; * 3.49 


The differences 0.0167 — 0.0184= -0.0017 and 3.17 - 3.49= -0.32 
are explained partly by the fact that the assumption 772 = const. 
is not exactly in accord with the facts, and partly because para- 





14 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


dromy is really not present as assumed, but only a weak antidromy. 
This last should, however, be considered as due to chance. The 
artificial character of a total aggregate shows itself in paradromy. 

Of the two quantities @ and £3, only the latter can be 
considered as a proper measure of the stability of a statistical 
frequency—more éxactly, of the corresponding fundamental 
probability. And, since on account of formulas (11) and (12), 
the total aggregate can never show a higher value of than 
the average for the partial aggregates (because the upper limit 
for 7 is 1), we obtain a glimpse of the question of the connec- 
tion between stability and homogeneity. 

The idea of homogeneity as we here understand it has refer- 
ence to the result of the decomposition of a statistical aggregate 
according to some attribute or complex of attributes. The aggre- 
gate may consist of s elements, say S$ human beings and 


the decomposition may yield MV _ sub-aggregates containing 
Ss’, Ss” . .. . elements. Let some event A be observed 
ac times in the total aggregate and w’,ac”, . . . times in 
the sub-aggregates. If we find the relative frequencies 


oe 


x a “= 
Y- 3? Y* Ss ea e**** 


+hen, on account of the two identities, S‘+S” + . 
and w+ x" . . + * 2c , we have the relation 


The “general frequency” then appears as the weighted arithmetic 
mean of the “special frequencies,” y' , y” , . 4 

The theory of probabilities, with more or less assurance, fur- 
nishes us a criterion for deciding whether or not the deviations 
of the quantities y’, y”, . . . from y are due to chance. 





L. v. BORTKIEWICZ 15 


If they are not due to chance we say that the total aggregate 
“seacts” to the decomposition in question and that the attribute 
or complex of attributes which governs the decomposition is 
“relevant.” If they are due to chance, we say that the total aggre- 
gate does not react to the decomposition and that the attribute 
is “indifferent.” 

According to the standpoint of the theory of probability, the 
relative frequencies y, y', y” . . . as also the quotients 

: » + « . an be considered as approximations of 
distinct probabilities. If we designate the two series of probabil- 
ities thus inferred by po, p', op", ...andg’, gi... 
respectively, we find 


Par gy 


and the character of the attribute in question as relevant or in- 
different finds expression in the fact that the “special probabilities” 
P's Pp: . . . either differ from one another or are all equal 
to p , the “general probability.” 

For every ample enough complex of attributes we can imagine 
the decomposition going on and on by applying one attribute of 
the complex after another. Finally a point is reached where the 
sub-aggregates no longer react to further decomposition, or, ex- 
pressed otherwise, the supply of relevant attributes is exhausted, 
and the probabilities ',.”, . . . which are associated 
with these sub-aggregates are called “elementary probabilities.” 
In this case we say that the sub-aggregates themselves are “com- 
pletely homogeneous” with reference to the event A . 

The total aggregate—still in reference to Al —is the more 
diversified the more the elementary probabilities po‘, p”, . . . 
differ among themselves, that is, the more they differ from p . 
It is reasonable to take as a measure of this diversity the expression 





16 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


6 , defined by 


(14) 5 *=9'(p-p)*+9 “(p* p)*+---- 


Diversity and homogeneity are antithetical notions; the more 


undiversified the aggregate, the more it is homogeneous, and vice 
versa. 


In order to apply this view of homogeneity, now considered 
for itself, to the procedure and the examples which we have 
brought forward in the discussion of stability, we must disregard 
the time fluctuations of the probabilities in question. That is, 
we do not use the quantities ,, but fix attention on the 
probabilities fo; which refer to an individua! time interval of 
n partial intervals—say a decade. By carrying out repeatedly 
the decomposition according to formula (13), the quantities 
P; + Pe, not included may be expressed in the form 


Pr "9; P. +9. Peto 


where ~; , ~; . . . are elementary probabilities. Cor- 
responding to formula (14), we have 


(15) 5.9; (p!-p,)°+ 9; (p;"-p¥e- 


If we designate the proportion of the ¢ th partial aggregate 
to the total aggregate by cc, , that is, if we let <i = Ci 


_ 
we find 
Rk GP; 
ez] 





L. v. BORTKIEWICZ 


and at the same time 


a” 


a) 6 F feigi (pl-pd ei 8 Gi-as} 


e=f 


The number of summands in (16) is mW _ , since there are 
n partial aggregates and each of these is a totality of WV sub- 
aggregates. It may easily occur that some of the m WN elemen- 
tary probabilities are equal and this is expected in connection 
with elementary probabilities which are associated with similar 
sub-aggregates. But even in the most extreme case, where the 
elementary probabilities are equal without exception, we cannot 
say that the probabilities p; are all alike. This can occur only 
when the values g’, 9; . . . are independent of ¢ . 
This highly improbable case is excluded from our discussion. We 
have then 


"” 


(17) é ci (pj- 2° >.0 


From (15) and (16), we have the following: 


9 (PI PY 95 (Rj PIs = 64 (Rj-P_Y” 


&: bi 6; + Z 6; (2, -2, 9° 
@ 
















18 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


so that, on account of (17) 


§ 7 2 c; 8; 


” 
t=/ 
and a fortiori 


(18) 672 «: § 


The total aggregate is then under all circumstances less 
homogeneous than the partial aggregates are on the average 

This statement might possibly correspond to the every-day 
meaning of the word “homogeneity,” which carries with it no pre- 
cise quantitative idea. Indeed, when we consider that in the case 
of the total aggregate we have to take into account not only the 
lack of homogeneity within the partial aggregates, but also the 
’ diversity with which the partial aggregates may make up the 
whole, we are inclined to say that the total aggregate is less homo- 
geneous than any of its parts. With that idea. however, we do 
not hit upon the right thing as far as our mathematical criterion 
of homogeneity is concerned. The inequality (18) says only, 
that the average of the values 5, , 3,, oe 5, is less than 
5,» not that each one is less than §,. 

In our foregoing discussion of stability as measured by the 
relative essential fluctuation component, we found that for the 
total aggregate the stability was higher than the average for the 
partial aggregates, except for the case of isodromy, which in prac- 
tice rarely occurs. Hence, there exists between homogeneity and 
stability an antagonistic relation—small homogeneity goes hand 
in hand with great stability. For example, the provinces into 
which a country may be divided will show, on the average, a 
greater homogeneity and at the same time a lesser stability in 
reference to an event A than will the country taken as a whole. 




























L. v. BORTKIEWICZ 





19 


Again, the districts into which the provinces may be divided will 
on the average show a greater homogeneity associated with a 
still smaller stabiljty. We can say that in general the homo- 
geneity increases with the narrowing of the field of observation, 
while the stability decreases. 

Is this to be considered as a warning against the all too popu- 
lar diversification of statistical material which is being more and 
more accepted in research methods? Not in the least. That 
would be an obsolete point of view, as if the problem of statistics | 
consisted in a search for most stable values. Rather does the 
opposition between homogeneity and stability give direction to 
business practice, especially to that branch of business which is 
in such close touch with statistics, namely insurance, where sta- 
bility is of prime importance. It has been known for a long 
time that it contributes to the even tenor of the business side 
if the risks are as heterogeneous as possible. It is of advantage 
if the insured persons or things are spread relatively widely ac- 
cording to geographical and other points of view, instead of con- 
centrating on a limited territory or few kinds of risks. 

Accordingly, even if this thesis, that an antagonistic relation 
exists between homogeneity and stability, seems surprising and 
strange, we find on closer consideration that the theory agrees 
with a practice which has instinctively grasped the true situation. 
It is now twelve years since I had the first opportunity to explain 
at greater length than here the foregomg developed ideas and 
with the verifying data to present them to my colleagues.’ As 
far as I know, only one of these has taken a definite stand in 
the matter. This is John Maynard Keynes.2 He makes the 
charge against me, that instead of clearing up a very simple mat- 
ter, I have befogged it with a profusion of mathematical formulas 


*Homogeneitat und Stabilitat in der Statistik, in the Skandinavisk Akiu- 
arietidskrift, 1918, pages 1-81, Upsala. 


*A treatise on probability, London, 1921, pages 403-405. 
































20 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 





and new technical terms, and he believed that he could show this 
best by an example of my own from the field of insurance. In 
referring to this example, Keynes thought that the distinction 
made by myself in a much earlier publication between a gen- 
eral probability ,o and the special probabilities p,, P,, . - « 
was the one in question, where 


p- 3 p+ 


- 
; 


Keynes further expressed himself as follows: 





“If we are basing our calculations on p and do not 
know p,, P>, etc., then these calculations are more 
likely to be borne out by the result if the instances are 
selected by a method which spreads them over all the 
groups 1, 2, etc., than if they are selected by a method 
which concentrates them on group 1. In other words 
the actuary does not like an undue proportion of his 
cases to be drawn from a group which may be subject to 
a common relevant influence for which he has not allowed. 
If the a priori calculations are based on the average over 
a field which is not homogeneous in all its parts, greater 
stability of result will be obtained if the instances are 
drawn from all parts of the non-homogeneous total field, 
than if they are drawn now from one homogeneous sub- 
field and now from another. This is not at all para- 
doxical. Yet I believe, though with hesitation, that this 
is all that Von Bortkiewicz’s elaborately supported math- 
ematical conclusion amounts to.” 


Suppose, for example, that a fire insurance company insures 





"Here # refers to a series of “equally likely events,” which is broken up 


into groups of Z,, Za - + . +. equally likely events. Hence 
2*Z,+z 2+*°- 


L. v. BORTKIEWICZ 21 


two kinds of buildings, dwellings and factories, which are classified 
as different grades of fire risks, for insurance premiums which 
are not graded. The premium is to be calculated per unit on 
the supposition that the risks in the two categories are divided 
in a definite proportion. Then, according to Keynes, a greater 
stability in the business is guaranteed if every year dwellings as 
well as factories are insured, than if in one year only dwellings 
and in another year only factories are insured. This is certainly 
true and requires no lengthy argument. But it has nothing what- 
ever to do with my thesis of the antagonistic relation between 
stability and homogeneity. 

To give an example which does illustrate my theory, think 
of three insurance companies, A, B, and C. A insures only 
dwelling houses, B only factories, while C insures both. The 
premiums in A, B. and C are different because of the different 
classes of risks. It is assumed in C that there is no grading of 
premiums. A premium per unit is charged which is calculated 
according to the relative number of the two risks. The premium 
is to be just high enough so that for a period of years, allowing 
for variations due to chance, the damages are just covered. In 
the course of this period, the danger of fire varies from year to 
year, showing gains in some years, losses in others. Such fluctua- 
tions of fire hazard would correspond in my scheme to the varia- 
tions of the probabilities ~,, with respect to k , while p, y 
is associated with A, p,, with B,and p,, with C. And in 
accord with my theory that, except in the case of isodromy, the 
values p,, , relatively speaking, show weaker variations than 
P. « and Pe, , do on the average, the insurance company C 
would show relatively smaller fluctuations of fire damage from 
one year to another, resulting in a more stable business than 
would be shown by the average of A and B. The mixed charac- 


ter of the risks would be conducive to greater stability. In the 
case of C a certain compensation of effects would take place 





22 RELATIONS BETWEEN STABILITY AND HOMOGENEITY 


which the time variations of the two-sided fundamental probabil- 
ities would make manifest on the business side.’ But Keynes 
says nothing of these variations. He simply missed the point of 
my argument and his remarks were not relevant. 

It is to be hoped that the new exposition of my theory, 
although, or because, it is essentially shorter than the older one, 
will give no cause for a similar misunderstanding. 


*This compensation would also appear in the more complicated case where 
the proportions of the risks in ¢ are not unchangeable as is assumed in 
the text, but would change from year to year (the premium being adjusted 
accordingly). We need not go further into this matter because, in my 
theory, the composition of $,, out of the component parts 8; 4% is 
considered as fixed. In my examples, this composition varied, but the 
fluctuations were insignificant in comparison to the variations of the values 
Pik + See Skandinavisk Aktuarietidscrift, pages 69-70. 





BAYES’ THEOREM 


An Expository Presentation* 


By 


Epwarp C. Mo.ina 


American Telephone and Telegraph Company 


Bayes’ theorem made its appearance as the ninth proposition 
in an essay which occupies pages 370 to 418 of the Philosophical 
Transactions, Vol. 53, for 1763. An introductory ietter written 
by Richard Price, “Theologian, Statistician, Actuary and Political 
Writer,”! begins thus: 


“I now send you an essay which I have found 
amongst the papers of our deceased friend, Mr. Bayes, 
and which, in my opinion, has great merit, and well 
deserves to be preserved.” | 


A few lines further on Price says: 


“In an introduction which he has writ to this Essay, 
he says, that his design at first in thinking on the subject 
of it was, to find out a method by which we might judge 
concerning the prohability that an event has to happen, in 
given circumstances. upon supposition that we know 


*Read before the American Statistical Association during the meeting of the 
American Association for the Advancement of Science in Cleveland, Ohio, 
December, 1930. 

* These titles are associated with the name of Price in the frontispiece por- 
trait of him bound with the December, 1928, issue of Biometrika. 











BAYES’ THEOREM 





nothing concerning it but that, under the same circum- 
stances, it has happened a certain number of times, and 
failed a certain other number of times.” 


























“Every judicious person will be sensible now that the 
problem mentioned is by no means merely a curious spec- 
ulation in the doctrine of chances, but necessary to be 
solved in order to assure a foundation for all our reason- 
ings concerning past facts, and what is likely to be here- 
after.” 


No one will dispute the importance ascribed to Bayes’ problem 
by Price; in- fact, a paper by Karl Pearson on an extension of 
Bayes’ problem is entitled “The Fundamental Problem of Prac- 
tical Statistics.” Opinions differ, however, as to the validity and 
significance of the solution submitted in the essay for the problem 
in question. In view of this situation I shall limit myself today 
to an exposition of the fundamental characteristics of the prob- 
lem Bayes’ theorem deals with and shall give 1:0 consideration to 
its interesting applications. 

The exposition may be outlined as follows: after specifying 
the class of problems to which Bayes’ theorem pertains, I shall: 

I. Discuss briefly two problems, each of which will empha- 
size one of two kinds of a priori probabilities which should be con- 
stantly borne in mind when Bayes’ theorem is under consideration, 

II. Partially analyze a certain ball-drawing problem which 
will not only serve as an introduction to the algebra of Bayes’ 
theorem but will later help to throw light on its significance, 

III. Present Bayes’ problem and the related theorem. 

iV. Make some remarks on the value of the theorem and 
the controversies which it raised. 

In carrying out this plan I shall find it convenient to ignore 
the historic order of events. 


When probability is the subject under consideration one an- 


E. C. MOLINA 2s 


ticipates problems such as: A coin is about to be tossed 15 times; 
What is the prebability that heads will turn up seven times? A 
sample of 100 screwdrivers is to be taken from a case containing 
1000 screwdrivers of which 300 are known to be defective; what 
is the probability that the sample will contain 25 defectives? 

These are direct, or a priori, probability problems. In each 
of them the nature of a game, or an experiment, is specified in 
advance and then a question is asked relating :o one, or more, of 
the possible outcomes of the game or experiment. Problems of 
this type have occupied the attention of mathematicians since the 
days of Pascal and Fermat, the creators of the mathematical theory 
of probability. 

An inverse class of problems of great practical significance, 
called a posteriori probability problems, came into prominence with 
the publication of Bayes’ essay. In these we find specified the re- 
sult or outcome of a game which has been played, whereas the 
question then asked is whether the game actually played was one 


or some other of several possible games. This type of problem 
is usually stated as follows: 


“An event has happened which must have arisen from 
some one of a given number of causes ; required the prob- 
ability of the existence of each of the causes.” 


I 


Consider this example: During his sophomore year Tom 
Smith played on both the baseball and football varsity teams; 
we have been informed that he broke his ankle in one of the 
games ; what are the a posteriori probabilities in favor of baseball 
and football, respectively, as the baneful cause of the accident? 

Evidently the answer depends on the number of baseball and 
football gaines played during their respective seasons and also on 
the likelihood of a man breaking an ankle in one or the other of 








2 BAYES’ THEOREM 


these two games. As a concrete case assume that: 

1. At Smith’s college an equal number of baseball and football 
games are played per season; 

2. Statistical records indicate that if a student participates in a 
baseball game the probability is 2/100 that he will break an 
ankle and that, likewise, the probability is 7/100 for the same 
contingency in a football game. 

In view of the first of these two assumptions our conclu- 
sions as to the cause of the accident may be based entirely on the 
information contained in the second assumption. The odds are 
two to seven, so that the a posteriori probabilities regarding the 
two admissible causes are: 


For baseball, 2/(2+7) = 2/9. 
For football, 7/(2+7) =7/9. 


Now consider this other example. A lone diner amused him- 
self between courses by spinning a coin. We elicited from the 
waiter that in 15 spins heads turned up seven times. Moreover, 
from our point of observation, the size of the coin indicated that 
it was either a silver quarter or a ten-dollar gold piece. What are 
the a posteriori probabilities in favor of the silver quarter and the 
gold piece, respectively ? 

If the lone diner were a professor from one of our eastern 
universities we would not hesitate a moment in declaring that the 
coin spun was a quarter. But it happens that the gentleman was 
a member of the Cleveland Chamber of Commerce, dining at the 
Bankers’ Club. We must, therefore, give the matter more careful 
consideration. The number of quarters and gold pieces usually 
carried by a banker and the probabilities of obtaining the observed 


result by spinning coins are relevant; let us assume, therefore, 
that : 


1. The small change purse of a Cleveland financier contains, on 
the average, ten-dollar gold pieces and quarters in the ratio of 





Og TST TS - 








E. C. MOLINA 27 
eight to three. 


Moreover, we may assume (in fact we know) that: 


2. Hf either a quarter or a gold piece is spun 15 times, the prob- 
ability that heads will turn up seven times is approximately 1/5. 


The second of these two items of information makes the a 
posteriori probabilities depend entirely on the first item. Clearly 
the odds are eight to three and we conclude: 


For a quarter, a posteriori probability=3/(3+8) = J/il. 
For a goldpiece, a posteriori probability=8/(3+8) = 8/11. 


Now regarding the general a posteriori problem, 


“An event has happened which must have arisen 
from some one of a number of causes ; required the prob- 
ability of the existence of each of the causes,” 


what do the two examples we have just considered suggest? In 
both problems we inquired into: 


1. The frequency with which each of the possible causes is met 
BEFORE THE OBSERVED EVENT HAPPENED. This frequency 
is called the a priors existence probability for the correspond- 
ing cause. 

2. The probability that a cause, if brought into play, would re- 
produce the observed event. This probability will hereafter 
be referred to as the a priori productive probability for the 
cause in question. 


TR LT TT Ty TT a 


In the case of the broken ankle, the a priori existence prob- 
abslities were equal and took no past in our conclusion; we based 
the a posteriori probabilities entirely on the a priors productive 
} probabilities. We did just the opposite with reference to the coin 
spun by the Cleveland financier; on account of the equality of the 
@ priori productive probabilities we deduced a posteriori prab- 


ees 








2 BAYES’ THEOREM 


abilities in terms of the unequal a priori existence probabilities. 

It is apparent that our two examples represent extreme cases. 
In general, the solution of an inverse or a posteriori problem, in- 
volving a number of causes, one of which must have brought about 
a certain observed event, depends on both sets of direct, or a priori 
probabilities. Those of the first set give the frequency with which 
the various causes were to be expected before the observed result 
occurred ; those of the second set give the frequencies with which 
the observed result would follow from the various causes if each 
were brought into play. 


II 


Bearing in mind the two distinctly different sets of a priori 
probabilities required in arriving at a posteriori conclusions re- 
garding the possible causes of an observed event, we must now 
give some thought to the algebra of the subject before taking up 
Bayes’ problem and theorem. For this purpose consider the fol- 
lowing bag problem: 

A bag contained /7 balls, of which an unknown number 
were white. From this bag NV balls were drawn and of these 7 
turned out to be white. What light does this outcome of the 
drawings throw on the unknown ratio of the number of white 
balls to the total number of balls, M7, in the bag? Let wx be 
this unknown ratio. 


Two cases of this problem may be considered: 


Case 1.—After a ball was drawn it was replaced and the bag was 
shaken thoroughly before the next drawing was made. 
Case 2.—A drawn bail was not replaced before the next drawing. 


These two cases become essentially identical when the total 
number of balls in the bag is very large compared with the num- 
ber drawn. Case 1 will serve as an introduction to Bayes’ prob- 


























E. C. MOLINA 29 


lem ; later we will find it highly desirable to consider Case 2. 


We are confronted with ( /7+1) possible hypotheses or 
causes before the drawings took place: 


1-the unknown value of 2c is 2, = 0/M, 
2-the unknown value of 2c is x2, = 1/™, 
3~the unknown value of x is x, = 2/M, 


A+ 1-the unknown value of x is x, = A/M, 


M +1 -the unknown value of x 1s x, = M/1M = 1. 


Let w(.2x,) be the a priori existence probability for the 4 ’th 
hypothesis; by this is meant the probability in favor of the & ‘th 
hypothesis based on whatever information was available regarding 
the contents of the bag prior to the execution of the drawings. 

Let 8(7, N, x,) be the a priori productive probability 
for the & ‘th hypothesis; by this is meant the probability of ob- 
taining the observed result ( 7 whites in /V drawings) when the 
value of x is 4//7. 

Then, the a posteriori probability, or probability after the 
observed event, in favor of the 4’th hypothesis is 


(1)! _~ ied ON He) 
2, Wlx,) BUT, N, Xx) 


‘790 


For Case 1 of our bag problem we have 


B(T N, x,)= (4) a (s-2,)7 


* This is the Laplacian generalization of Bayes’ formula, although in some 
textbooks it is referred to as “Bayes’ Theorem.” A relatively short dem- 
onstration of it is given by Poincaré in his Calcul des Probabiités. See 
also Fry, Probability and its Engineering Uses, Art. 49. 





x» BAYES’ THEOREM 


where (: ¥) represents the number of combinations of VV 
things taken 7 at a time. Substituting in (1), we obtain, 
after canceling from numerator and denominator the common 


factor {F) ® 


(2) p,- Wied ee 0-4) 


aT 
2. W(Xe) Xe Cr- x4) 


If in equation (2) we give A successively the values a ° 
atl, a+2,... 6- 1, }& and add the results, we 


a=6 


& waxy) xy” C- xe) wid 


(3) Dlx, x)= Be “Se OO : 


w (xy) x,” Cl- <n) — 


for the a posteriori probability that the unknown ratio of white 
to total balls in the bag lies between @//7 and 5//%, both 
inclusive. 


III 
BAYES’ PROBLEM 


Consider the table represented by the rectangle 4&C OD in 
Fig. 1. On this table a line QS was drawn parallel to, but at 
an unknown distance from, the edges AD and 8C. Then 
a ball was rolled on the table WV times in succession from the 





E. C. MOLINA 31 


edge AD toward the edge BC . As indicated in the figure 
it was noted that 7 times the ball stopped rolling to the right 
of the line OS and N- 7° times to the left of that line. 
What light does this information shed on the unknown dis- 
tance from AD to OS? In more technical terms, what is 
the a posteriori probability that the unknown position of the line 
OS lies between any two positions in which we may be interested? 


D 


Fig. 1. 


Each rolling of the ball was executed in such a manner that 
the probability of the. ball coming to rest to the right of OS is 
given by the unknown ratio of the distance OA to the length 
BA of the table; likewise, the probability of the ball stopping 


to the left of OS is given by the ratio of the distance SO to 
the length BA. 


Set 2~ OA/BA, 1-x- 80/BA. 


The only difference between this problem and the bag of balls 
problem is that now the possible values of X are not restricted 
to the finite set 0//,1//7,2//7, . . . (M-1)/M,M/M; 
in the table problem . may have had any value whatever between 
the limits of 0 and 1. Therefore equation (3) will answer the 
question asked provided we substitute definite integrals in place 
of the finite summations. This substitution gives us, for the de- 








32 BAYES’ THEOREM 


sired a posterior’ probability that ac had a value between 2, and 
x,, the formula 


ledn” C-x) "ax 
a en 
J wax” (1-2) 


Equation (4) is useless until the form of the a priors exis- 
tence function w(x) is specified; this depends on the way in 
which the line OS was drawn, Bayes assumed that the line 
OS , of unknown distance from AOD, was drawn through the 
point of rest corresponding to a preliminary roll of the ball. This 
amounts to postulating that all values of 2c, between 0 and 1 
were a priori equally likely. In other words, with Bayes, the 
G priori existence function w(ac) was a constant which, therefore, 
did not have to be taken into consideration.1 Thus, instead of 
equation (4), Bayes gave the equivalent of the following restricted 
formula : 


oo tim = 


I x"ll-x2” "ax : 


I say “the equivalent of” (5) because in Bayes’ day definite 
integrals were expressed in terms of corresponding areas. 

Equation (5) constitutes Proposition 9 of the essay, but is 
usually referred to as Bayes’ theorem. 


* The existence function w (oc) does not appear either explicitly or implic- 
itly anywhere in Bayes’ essay. This fact raises the question as to whether 
or not Bayes had any notion of the general problem of causes. 





= 





 ensiieaeaaniaail 





E. C. MOLINA 33 


IV. 


Equation (5) is a very beautiful formula; but we must be 
cautious. More than one high authority has insinuated that its 
beauty is only skin deep. Speaking of Laplace’s generalization 
and extension of the theorem, George Chrystal, the English math- 
ematician and actuary, closed a’severe attack on the whole theory 
of a posteriori probability’ with the statement that “Practical peo- 
ple like the Actuaries, however much they may justly respect 
Laplace, should not air his weaknesses in their annual examina- 
tions. The indiscretions of great men should be quietly allowed 
to be forgotten.” 

Chrystal’s advice as to the attitude one should assume toward 
“the indiscretions of great men” is excellent, but in the case under 
consideration, it was the plaintiff rather than the defendant who 
committed indiscretions; this is discussed in a paper by E. T. 
Whittaker? entitled “On Some Disputed Questions of Probability.” 

The discussions and disputes, which began shortly after the 
birth of the formula in 1763 and which have not as yet subsided, 
may be divided into two classes: 


1. Discussions concerning problems in which it is known that the 
a@ priors existence function is not a constant. 

2. Discussions concerning problems in which nothing whatever 
is known concerning the a priori existence function. 


The discussions of Class 1 are out of order in so far as 
Bayes’ theorem is concerned; recourse should be had to formula 
(4), Laplace’s generalization of the Bayes’ theorem, when it is 
known that a (2X) is not a constant. Failure to differentiate 


*“On Some Fundamental Principles ‘in the Theory of Probability,” Trans- 
actions of the Actuarial Society of Edinburgh, Vol. 11, No. 13. 

* Transactions of the Faculty of Actuaries in Scotland, Vol. VIII, Session 
1919-1920. 











4 BAYES’ THEOREM 


explicitly between equations (4) and (5) has created a great deal 
of confusion of thought concerning the probability of causes. The 
discussions of Class 2 have centered on what Boole called “the 
equal distribution of our knowledge, or rather of our ignorance,” 
that is to say “the assigning to different states of things of which 
we know nothing, and upon the very ground that we know noth- 
ing, equal degrees of probability.” Regarding the legitimacy of 
this procedure Bayes himself contributed a very important schol- 
ium, which appeared in his essay on pages 392 and 393. ' The 
argument in this scholium, based on a corollary to Proposition 8 
of the essay, may be summarized as follows: 

Assuming that all values of x- are a priori equally likely and 
that the WV throws of a ball on the table have not yet been made, 
the probability that 7 times the ball will rest to the right of OS 
and that the remaining /V-7 times it will rest to the left of 
OS is (as shown in the corollary) 


© pf (Wx 0-2 tx a7 





a result in which 7 does not appear. In other words, any as- 
signed outcome for the throws is no more, or no less, likely than 
any other outcome, if a priori all values of oc are equally likely. 
But, wrote Bayes in the scholium, when we say that we have no 
knowledge whatever a priori regarding the ratio ac , do we mt 
really mean that we are in the dark as to what will be the out- 
come when we proceed to make -/V throws? If so, then equa- 
tion (6) justifies the assumption that a priori-ail values of 2¢ are 
equally, likely. : 

To clinch his argument it must be shown that the converse 
of equation (6) is true. That is, it must be shown that, if any 
outcome of throws not yet made is as likely as any other, then 





E. C. MOLINA 35 


any value of x is a priori as likely as any other. This converse 
theorem was submitted to Dr. F. H. Murray, who obtained an 
elegant proof based on a theorem of Stieltjes. 

In view of Bayes’ corollary and his scholium, an analysis of 
our bag problem with reference to the “equal distribution of our 
knowledge, or ignorance” is in order. 

Consider again Case 1 where each drawn ball is replaced in 
the bag before the next drawing is made. 

Assuming each of the ( 47 +1) permissible hypotheses to be 
a priori equally likely, the probability that MV drawings, not yet 
made, will result in 7 white and N- 7 black balls is 


PB a (TCH) OA) 





Equation (7) is not, in general, independent of 7 ? so that 
any one assigned outcome of N drawings is not as likely as any 
other outcome. This result is disturbing; at first sight it seems 
to discredit Bayes’ scholium. We must, therefore, look into the 
the matter more closely. 

Bayes’ problem corresponds to drawings from a bag con- 
taining an infinite number of balls. Therefore, even if drawn 
balls are replaced, the chance of a particular ball being drawn 
more than once is zero. But when NW drawings with replace- 
ments are made from a bag containing a finite number, /f , of 
balls, we are by no means certain of drawing WV different balls; 


* Bulletin of the American Mathematical Society, February, 1930. 
"Consider, for example, the case of «2. Equation (7) reduces to 


p-4(4)"( 7) 


a result which is not independent of T. 








3% BAYES’ THEOREM 



















a particular white ball may be drawn several times over, and, like-- 
wise, a particular black ball may appear more than once. It is not 
surprising, therefore, that Case 1 of the bag problem does not 
confirm Bayes’ corollary. | 

. Consider now Case 2, where the drawn balls are not returned , 
to the bag. If A of the total balls are white and the rest black, 
the probability that a sample of WV balls from the bag will con- 
tain 7 white and AW- 7 black is | 


fs ( - *) ( -) 
T] (n-7 N 

Hence, if the permissible values 0,1, 2,3, . . . M for 
are all equally likely a priori, we obtain instead of (7), 


aa a itil att | 
QP 2 (AM MND) Cn) * AT 


a result independent of any assigned value for 7 and identical 
with the result in the corollary to Proposition 8 of the essay. 


SUMMARY: 





Bayes’ theorem is the answer to a special case of the general 
problem of causes. The special case- postulates that the a priors- 

. existence probabilities for the various admissible causes of an ob- 
served event are equal. | 

In the essay Bayes recommends that his theorem be adopted 

whenever we find ourselves confronted with totdl ignorance as 
to which one of several possible causes produced an observed 
event. To justify this recommendation Bayes. takes the attitude 
that: A state of total ignorance regarding the causes of an ob- 


E. C. MOLINA 37 


served event is equivalent to the same state of total ignorance as 
to what the result will be if the trial or experiment has not yet 
been made. This interpretation is a generalization of the fact 
that in his billiard table problem, the assumption of equal likeli- 
hood for all possible positions of the line OS , gives equal prob- 
abilities for the various possible outcomes of a set of WV ball 
rollings not yet made. 

Laplace, Poincaré and Edgeworth! have shown that the a 
priori existence function wv(ac) , which appears in the Laplacian 
generalization of Bayes’ theorem, is of negligible importance when 
the numbers /V and /T are large. Therefore, when this con- 
dition holds, one need not hesitate to use Bayes’ restricted formula 
for the solution of a problem of causes. 

The transmission, by Price, of Bayes’ posthumous essay to 
the Royal Society marked an epoch in the history of the literature 
on probability theory. As mentioned at the beginning of this 
paper, Karl Pearson has called the extension of Bayes’ problem 
the “Fundamental Problem of Practical Statistics.” 


*Laplace: “Ocuvres,” Vol. 9, p. 470. Poincaré: “Calcul des Probabilités,” 
2d edition, p. 255. Bowley: “F. Y. Edgeworth’s Contribution to Math- 
ematical Statistics,” pp. 11 and 12. 


Bb Geb 
























ON CERTAIN PROPERTIES OF FREQUENCY 
DISTRIBUTIONS OBTAINED BY A LINEAR 
FRACTIONAL TRANSFORMATION OF THE 
VARIATES OF A GIVEN DISTRIBUTION 





By 


Considerable evidence has been presented by R. A. Fisher? . 
to show that, by an appropriate transformation w- f(r) of 
small sample correlation coefficients r(-/& rr & /) distributed 
in accord with a decidedly skew frequency curve, values of 2 
are obtained which are distributed nearly in a normal distribution. 
In fact, the approach of the distribution of z to normality: 
seems sufficiently rapid to justify the use of the probable error 
of z in many applications as if it were normally distributed. 
Such a change in the character of the distribution of an important 
statistic suggests the further study of properties of the distribu- 
tion of variables obtained by applying rather simple transforma- 
tions to variates distributed from —1 to +1 in accord with a given 
frequency function. In a previous paper,? the writer has dealt 
with a similar problem when each variate of a given unimodal 
distribution of any finite range is replaced by a given power of 
the variate. 









Consider a positive unimodal continuous frequency function 






* Metron, Vol. 1, Part 4 (1921) pp. 3-32. 
* Proceedings of the National Academy, Vol. 13, No. 12 (1927), pp. 817-820. 


H. L. RIETZ 39 


y= ¥ (x) of a system of variates 2,, 2%, . . . ZX, with 
a range of -1 to +1, with W (-1)= W (1)=0, with a single 
mode at some point, say at x= (-1 < 6 <1), and with the 
derivative vic 2c) continuous. More precisely, we assume 
that Y 2x) _ is positive except at the end points of the in- 
terval -1 to +1, where it is zero, and that #&’(2x) changes 
from positive to negative at 2-6 , and is non-negative or 
non-positive at any point 2te@ according as @ is less or 
greater than 4. 

It is the main object of the present paper to consider certain 
properties of the distribution of variates u;=(ex,+f )/ 
( 9xX¢+h .) obtained by a linear fractional transformation of 
the 2 ’s, where e , f, g,and / are real numbers so selected 
that u-(ex+f )/( gx+h ) is continuous from 
xXx=-1 tox =1. 

When g =0, we have the case of the linear transformation 
which simply has an effect equivalent to a change of origin and 
of unit of measurement. As we are not in the present problem 
much interested in such a simple transformation, we shall, in 
general, assume g,#0. Moreover, we take g positive, since 
this involves no loss of generality. 

We shall, except as otherwise stated, restrict our considera- 
tions to the interval for cz that corresponds to -1% x $1, 
and to such transformations that the derivative of u with re- 
spect to x is finite for each value of 2c and that « increases 
when sc increases. These restrictions require that 


du__—ihe-f 
aac (oath? 


where g <|/] and where the determinant 


(1) he-fg-| },| 7° 











40 CERTAIN PROPERTIES OF FREQUENCY DISTRIBUTIONS 


(3) 


Next, let 






















(4) 


be the frequency function of the new variates wu. Then we 
may write! 


(5) v= dtu)= 4 (FAs). (e's, , 

Since 4e-7g > 0, we know that v is positive through- 
out the interval in which we are interested except that V =O 
at the end points. From (5) it seems that the new distribution 
function may possibly become infinite when u «e/g , but the 
question then arises as to whether e/g is an admissible value 
of u. 

‘We shall prove that e€/g is not an admissible value of 
by showing that « cannot take the value e/g within the 
interval u (f-e)/Ch-g) to u-(e+t)/g+h) 
wherein « lies when -1 S 2 $1. In this connection we shall 
also establish some inequalities that will be found useful in the 
cénsideration of certain properties of the new distribution. Con- 
sider first the cases in which g+h/ is positive. 

Then since e4 > fg ,wehave ehtreg > fg+e9g . 

Divide by 9(g+) , and we have $ > SE Hence, 


*cf. Annals of Mathematics, vol. 23, No. 4 (1922), pp. 293-4. 











H. L. RIETZ 41 


e/g is too large when g+/ is positive to be an admissible 
value of «. 

Consider next the cases in which gt+h is negative. In 
this case, A<Osince g>0. Hence g-h >0. Then since 
eh>fg ,wehave eh-eg> fg-eg . Divide by 


the positive = g(9-%) . This gives 3 > gs 


and § < e-f Aq 
Hence, when ote +h) < 0, e/g is too small to be an 
admissible value of ec . 

To summarize with ¢ > 0, we have shown that: 

(a) When g+h_ is positive, e/g is too large to be 
an admissible value of w . 

(b) When gt+h is negative, e/9 is too small to he 
an admissible value of wu . 

Returning now to the consideration of our frequency function 


= 4 - ke a. : 
y ¥ (G24 . are? in (5), we obtain 


dv _ (he-fg)* C-hu)...29(he-to} f. 
©) au” (gu-e) v pos y (5s 


When u takes the value (e4+7)/€gb+h) into which 
variates at the mode .2x=4 _ are transformed, we know that 

¥' (524)-¥ = 0. 

By making use of the fact that Ae- 4g > 0, and the propo- 
sitions (a) and (b) relating to the inadmissibility of €/g as 
a value of « in an examination of the right hand member of 
(6) for «u -(e5+Ff)/ (gb+h) » we establish the 
following proposition in regard to the sign of the derivative 
dv/du for the value of «¢ which corresponds to the 
modal value of x 

When g+/ ¢# 0, dvy/d@u is positive or negative 
at u-(e6+f)/C 9b+h) according as g+h is 
positive or negative. 








42 CERTAIN PROPERTIES OF FREQUENCY DISTRIBUTIONS 


The truth of this proposition follows readily by applying 
(a) and (b) to (6), remembering that g is positive and that 

ur (b) vanishes. 

We shall show next in case g+ > 0, that dv/du 
is non-negative for all admissible values of «c¢ less than 
(eb+f)/(gbth)  . To see this from (6), note first 
that y¥ [(f-hu)/(gu-e)] remains non-negative for 
(f-hu)/( gu-e) <b or for ce less than 
(eb+t)/ (gb +h) , and note second that g/(gu-e)* 
is negative since @€/g is too large to be an admissible value 
of uw under the condition g+ > 0. 

Next, in case gth< 0, adv/de is non-positive for 
all values of Uu>r(ebtt)/(gbth) . Ta see this from 
(6), note first that ‘Te f-hu)/(gu-e) remains 
non-positive for ( illo > b or for J > feprt), 
and note second that g/(gu-e)* is positive when g+4<0 
because in this case au > €/g. 

To summarize, when g+/ # 0, we state the 


Theorem I. When the derivative dv/du_ is 
positive for the value of ct into which variates at the 
modal value 2-6 transform, then adv/du_ is 
non-negative for all smaller values of uw . Similarly, 
when Qv/du_ is negative for the value of ut into 
which variates at the modal value oc-6 _ transform, 
then adv/du_ is non-positive for all larger values 
of u. , 


Finally, we wish to inquire about a modal value for 
the frequency function y= g (u) in (5). To this end, 
consider first the case in which dv/clee is positive at 
u-= (eb+t)/(gbth) . At a point between 
c= (eb+f)/(gb+h) and the upper bound of ce , that 
is (e+f)/(9+#A%) ,a maximum value of vy occurs. To 





H. L. RIETZ 43 


see this, note when w= (e+f)/( 9+) that 
dv/du <' (1) (g+h)*/ (he-fg)* which is 
negative, or zero since yw (1) is negative or zero. If it is nega- 
tive, there is a maximum where the sign of the continuous first 
derivative changes from positive to negative. If @ v/ dau is 
zero at u- (e+f)/(g+h) , it follows also that there 
is at least one maximum of v- B(u) between u -(ebsH)//(gb+4h) 
and wu -(e+f)/Cgth) sine V= Oat uw (e+f)/(gih) 
and y ust have changed from an increasing positive function 
at u:(eb1f)/(gb+h) toa decreasing function before 
becoming zero at u= (e+f)/(g+h) . Similarly, it may 
be shown that there is a mode at a valueof «<(ebsf)/(g b+h) 
whenever fy /du is negative at u -(eb+f)/(gb+h). 
We may then state the following: 


Theorem II. Given a unimodal continuous positive 
function y= W (x) of variates x , with a range 
from -1 to +1, witha mode at 2 = b(-/<b< 4) ‘ 
with Y (-1)= ¢ (1)=0, and with the derivative YX 
continuous from X= -lto 2x = 1, then the frequency 
distribution v= Y (u) of variates u-(ex4+t)/(gxth) 
(9 > O) has a mode at a value of ur(ebrt)/(gbth) 
when g+h > 0. It has a mode at a value of 

uw<(ebt+f)/(gbth) when g+h < 0. 


; ; s (ex+t) 
Since we have so nrenS our cnenomnetinN a= (gx+h) 

that the order of corresponding values is preserved, the trans- 

formation carries the median of the distribution of ac ’s into the 


median of the distribution of cz ’s, and we may state the following: 


Corollary. If ye wW(2) has its median and 
mode coincident at x=b , the frequency distribution 
v- Blu) of u-(ext)/ (gxth) has a 
modal value greater or less than its median according as 
gth ts greater or less than zero. 
























44 CERTAIN PROPERTIES OF FREQUENCY DISTRIBUTIONS 





Thus far we have imposed the condition ¢< | h| . Let 
us next consider the cases in which M=-g and h=g 
instead of requiring that g<| h| . Consider first the case 
h=-9 : In this case 


(7) 


and 





tl 
r 
7 

E 

is 
% 
+ 
“bh 





(8) au - ‘ 
ax (gx+h)?* g(x-) 





Both « and a@«/ax become infinite as 2 approaches 
1, Suppose @ and fF so chosen that « is an increasing 
function of x for the interval -1 = 2¢< 1, then w in (7) is 
an increasing function of x for the larger interval - co<ar<l1; 
and it follows, for the case 4-~g , that e/g is too small to 
be an admissible value of « when -1 =x < 1, since it is the 
value of «& when x=- co. 
For the case /7:g  , we have 





ex+f 
(9) uU= g(x!) 


and 


au. e-f s 
~ ax” 9lxel)? 
















Since « in (9) is an increasing continuous function of x 
tor the interval -1 < 2 < eo wherever © and fF are so selected 
that it is increasing for the sub-interval -1 < a ¥ 1, it follows, 
for A=9 , that e/g ,the value of « when w= co , 
is too large to be an admissible value of u when -1 < x § 1. 
By making use of the fact that @/g is too small or too large 


H. L. RIETZ 45 


to be an admissible value of ce according as A=-g or +g, 
we readily obtain the following results from an examination of 
(6): The derivative v/a given in (6) is positive at the 
point u-(ebrf)/(gb+h) when hA-g , and it is 
negative at this point when =-9 

Moreover it readily follows as in the case where g <| A] 
that when the derivative @v/adu is positive for the value 
of « into which the modal o¢26 transforms,then av/ du 
is non-negative for all smaller values of ,and when cdy/du 
is negative for the value of & into which the modal value 2» 6 
transforms, it is non-positive for all larger values of cz 

Next, for the case A=g , a mode occurs for a value of 
“> (eb+f)/(gbth) . This may be seen by noting that 
as 2 approaches 1 and as uw _ takes corresponding values 
adv/au in (6) approaches the value 16 g? ¥ Wfe-t)* 
which is negative or zero. The analysis given above for the 
corresponding case ¢</ h] may be applied, with the conclusions 
stated in Theroem II by replacing g+h > Oby H=g and 
gth < Oby 4=-9 

The question very naturally arises as to whether there exists 
a linear fractional transformation u = (ea +t)/(gxreh) 
that will transform almost any distribution with the properties 
of y= & (x)  intoa new distribution v = p (a) with 
a mode at a previously assigned point «u-c within the range 
of admissible values of «. To insure a mode for v= @ (u) 
at «u<C_ , it is, of course, sufficient that there exist values of 
@.#f, 9, and / that make the continuous function 


» av. the ne-fg)" f-hu)_ 29(he-fg) 
) a (gu-e)* y Gs (gu-e) (a 


change sign from positive to negative at «= 
Since the only restrictions on e€ , ¢ , g » and A are that 



















46 CERTAIN PROPERTIES OF FREQUENCY DISTRIBUTIONS 





they shall be real, and that g and /e- 7g __ shall be positive, 
it seems that the requirement that @yv/du_ shall change from 
positive to negative at an assigned value of c could probably 
be satisfied for some important classes of relatively simple func- 
tions. As a simple example, take the quadratic function 
W(x) Ax*+6x+C — , which, when subjected to the 
conditions on YW (2x), becomes YW (x)=3(1 - 22) fA. 

The mode is in this case at 2= 0. The problem we propose 
is to find the linear fractional transformation ¢ (extt)/(grth) 
that will transform Y (x) into v: @ (u) with a mode at an 
assigned usc In this case (11) becomes 





a he-he 

= "3 Gano {ne-toxt-hu) 2 
(12) 
g[(ath')ut 2u(fh-eg)+ e*- | . 















Te facilitate the examination of (12),make 4=g . Then 
(12) reduces to 







avy ..3 ¢le-1)" -3gu). 
(13) °°? Vee? (er2f-3gu) 





Since G+ > 0, wehave gu-e < 0, and consequently 
the coefficient of ( € + 27-3gu) is positive. To provide for 
the change of sign of (13) at u=c , select e , f,and g 
sothat e+ 2% = 3cg . To make (13) positive at u-c-S 
and negative at «=c¢+& ,where & is arbitrarily small and 
positive, we may assign to @ any positive value and to @ any 
value greater than cq , for then f is less than ©€ , which is . 
the condition Ae-*Gg > 0 when fA=g . While there are 
thus an infinite number of ways in which we may select a linear 


H. L. RIETZ 47 


fractional transformation so that, when applied to special func- 
tions, it will give a new distribution with a mode at an assigned 
point, no general proposition is proved that assures an assigned 
modal value of WY (2) . 


























ON SMALL SAMPLES FROM CERTAIN 


NON-NORMAL UNIVERSES* 


By 


Pau. R. River 
Washington University 





INTRODUCTION 


The distribution of the ratio 


z — _mean of sample - mean of universe 
standard deviation of sample 
which is of great importance in the theory of small samples, has 
been derived exactly by theoretical methods for samples of any 
size from a normal universe. Experimental studies? have been 


*The writer desires to express his grateful appreciation to the National 
Research Council, which made possible this study by a grant-in-aid for 
the assistance of a computer. 


*See, for example, R. A. Fisher, Applications of “Student's” Distribution, 
Metron, vol. 5, No. 3 (Dec. 1, 1925), pp. 90-104. 5 


*e. g. W. A. Shewhart and F. W. Winters, Small Samples—New Exper- 
imental Results, Journal of’ the American Statistical Association, Vol. 23 
(1928), pp. 144-53; ; 

J. Neyman and E. S. Pearson, On the Use and Interpretation of Certain 
Test Criteria for Purposes of Statistical Inference. Part I, Biometrika, 
Vol. 2A (1928), pp. 175-240; 

“Sophister,” Discussion of Small Samples Drawn from an Infinite Skew 
Population, Biometrika, Vol. 20A (1928), pp. 389-423; 

E. S. Pearson assisted by N. K. Adyanth3ya and others, The Distribution 
of Frequency Constants in Small Samples from Non-normal Symmetrical 

and Skew Populations. 2nd paper, Biometrika, Vol. 21 (1929), pp. 259-86. 





P. R. RIDER 49 


made of the z -distribution for samples of specific sizes from 
other types of universe. A theoretical method applicable- to 
samples from a discrete universe was used in a previous paper,’ 
in which a rectangular universe was studied in some detail. The 
rectangular universe was chosen as being the simplest from the 
standpoint of the method employed, and as a good example of 
a limited symmetric distribution. It is the purpose of the present 
paper to apply the method to a triangular population, which is 
a specimen of a limited skew distribution, and also to a U-shaped 
universe. The rectangular, triangular and U-shaped universes 
are shown in Table I in the columns headed @ , 7 , and Ul, 
respectively. Their graphs are exhibited in Figure 1. 

In addition to the z -distribution, the distributions of means 
from the triangular and from the U-shaped universe are given. 

In the concluding section is discussed the probability corres- 
ponding to an interval of three sample standard deviations on 
each side of the sample mean. 

All of the results of the paper are for samples of four. 


THE DISTRIBUTION OF 2 


The distributions of zg are shown in Table II,?, in which 
the distribution for samples from a normal universe, WV , is also 
given. 

The cumulated probability of z2 for the triangular and for 
the U-shaped universe are shown in Table III, which may be 
compared with a similar table for a rectangular and for a normal 
universe given in Biometrika, Vol. 21 (1929), p. 131. 


*P. R. Rider, On the Distribution of the Ratio of Mean to Standard Devia- 
tion in Small Samples from Non-normal Universes, Biometrika, Vol 21 
(1929), pp. 124-143. 


*For an explanation of the method of deriving these distributions see 
Rider, loc. cit. 

















SAMPLES FROM NON-NORMAL UNIVERSES 


These cumulated probabilities are plotted on probability paper 
in Figures 2 and 3 and may be compared with similar probabil- 
ities for a rectangular universe by reference to Biometrika, Vol. 
21 (1929), p. 129, Figure 2. 

The principal results to be noted are as follows: 

1. The general characteristics of the zZ -distribution for 
the U-shaped universe are the same as those for a rectangular 
universe, viz. a greater number of z ’s outside of a certain value 
of \z| , and also a greater clustering of 2 ’s about the ‘origin, 
than is the case for a normal universe! This is to be expected, 
since the values of 8, for U and & are 1.132 and 1.776 
respectively, as compared with the value 3 for N . 

2. The negative skewness in the triangular universe pro- 
duces skewness of the opposite type in the distribution of #2 , 
as found experimentally by Neyman and E. S. Pearson? and by 
“Sophister.”"* This means (in the case of negative skewness 
in the universe) that the probability corresponding to an interval 
from - eo to # is smaller than when the sampling is from a 
normal universe. 

3. The cumulated probability of |2|, or the probability 
corresponding to an interval from -zZ to Z , is somewhat the 
same for the triangular universe as for a normal universe ;* a 
comparison is made in Table IV. 

Results 2 and 3 are apparently due to the fact that in a 


*See Rider, loc. cit. p. 130. 
* Biometrika, Vol. 20A (1928), p. 198. 
* Biometrika, Vol. 20A (1928), p. 408. 


cf. E. S. Pearson assisted by N. K. Adyanthaya and others. The Distribu- 
tion of Frequency Constants in Small Samples from Non-normal Sym- 
metrical and Skew Populations. 2nd paper, Biometrika, Vol. 21 (1929), 
pp. 259-86. 





P. R. RIDER 51 


skew universe the regression of variance on mean! is often essen- 
tially linear (if parabolic, the vertex of the parabola is well to 
one side of the scatter diagram). Let us consider the case in 
which the slope of the regression line is positive. Designating 
by . the difference between the mean of a sample and the 
mean of the universe, and by § the standard deviation of the 
sample, we see that large values of |2c| tend to be associated 
with large values of s* (and therefore with large values of $ ). 
Thus the values of z tend to be smaller. On the other hand, 
for large values of [-x/, $s is smaller and }-z| consequently 
larger. This means that the frequencies corresponding to the 
algebraically lower values of zw are greater than in the case of 
a normal universe, or that the use of “Student’s” tables would 
give results too small for the probability that the mean of a 
sample does not exceed algebraically the mean of the universe 
by more than z times the standard deviation of the sample. 
The opposite is true in the case studied here, since the universe 
is negatively skew and the regression line of s* on x would 
have a negative slope. 

Since there is a shifting of the whole cumulated 2 -distribu- 
tion to the right or left, the effect noted in 3 is readily explained. 
As a result of this effect we should apparently not be far wrong, 
when sampling from a skew universe, if we used “Student's” 
tables to obtain the probability that the mean of a sample does 
not exceed numerically the mean of the universe by more than 
z times the standard deviation of the sample.? 


*For the regression formula see J. Neyman, On the Correlation of the 
Mean and the Variance in Samples from an “Infinite” Population, Bio- 
metrika, Vol. 18 (1926), pp. 401-13. 


* See E. S. Pearson assisted by N. K. Adyanthaya and others, The Distribu- 
tion of Frequency Constants in Small Samples from Non-normal Sym- 


metrical and Skew Populations. 2nd paper, Biometrika, Vol. 21 (1929), 
pp. 259-86. 


































SAMPLES FROM NON-NORMAL UNIVERSES 


THE DISTRIBUTION OF MEANS OF SAMPLES 





The distributions of means of samples are shown in Tables 
V and VI. In these tables 2 indicates the difference between 
the mean of the sample and the mean of the universe. 

For the difficulties involved in obtaining satisfactory results 
for the distribution of means of small samples from a U-shaped 
universe see K. J. Holtzinger and A. E. R. Church, “Qn the 
Means of Samples from a U-shaped Population,” Biometrika, 
Vol. 20A (1928), pp. 361-88. 

The probability corresponding to an interval of three 
sample standard deviations on each side of the sample mean. 

If M is the mean and © the standard deviation of a nor- 
mally distributed variate X , then, as is well known, the prob- 
ability that an item selected at random will lie within the range 
f1 t3oa is 0997. If K and S are the mean and the 
standard deviation respectively of a sample, the expected or aver- 
age probability corresponding to the interval X +3 will be 
different from the probability corresponding to the interval 
4123.0. Shewhart! obtained experimentally for the average 
probability for samples of four associated with the interval 

X+t3s the values 0.90 for a normal universe, 0.91 for a 
rectangular universe, and 0.91 for a triangular universe. 

By analyzing all possible samples of four from the rect- 
angular and triangular universes of Table I it was possible to ob- 
tain the probability corresponding to an interval of 3S on either 
side of the sample mean. For example let us consider the 
sample (1, 1, 2, 2), for which XK =1.5, $0.5. The interval 
At3s extends from 0 to 3. This interval includes 0.4 of 
the rectangular universe  ; 0.4 then is the probability that an 


*W. A. Shewhart, Note on the Probability Associated with the Error of 
a Single Observation, Journal of Forestry, Vol. 26 (1928) pp. 601-607. 


P. R. RIDER $3 


observed value will fall within the interval. Now the particular 
sample (1, 1, 2, 2) would occur 6 times out of 10,000. If we 
take all of the samples for which the interval X+35s includes 
0.4 of the rectangular universe we find that such samples occur 
106 times out of 10,000. Such an analysis leads to Table VII, 
from which it is ascertained that the average probability corres- 
ponding to an interval of Xt 3s is 0.920. A similar analysis 
of the triangular universe 7 gives us Table VIII and yields 
0.907 as the average probability associated with K 43S. A 
better understanding of the situation may be obtained from 
Figure 4. 


Paul R. Rida 





SAMPLES FROM NON-NORMAL UNIVERSES 


TABLE I 


Rectangular, Triangular and U-Shaped Universes 


FREQUENCY 


fo ca eal 


10° 


~ 


0 1 
l 1 
2 1 
3 1 
4 1 
5 1 
6 1 
7 1 
8 1 
9 1 


wooaonN aA A WN — 
= et ttt tO 


— 
© 


o 
5, 


*The values of the 8 ’s are uncorrected for grouping. The 
dots over the digits indicate repeating decimals. The values for 
a continuous rectangular distribution are Bb =0, 4718, and 
for a continuous triangular distribution are (& = 0.32, BR: 2.4. 





P. R. RIDER 


TABLE II 


Probability of 2 for Samples of 4 





SAMPLES FROM NON-NORMAL UNIVERSES 


TABLE III 


The cumulated probability of z , or probability that the mean 
of a random sample of 4 will not exceed (in algebraic sense) the 
mean of the universe by more than # times the standard devia- 
tion of the sample. 


Cumulated Probability Cumulated Probability 
Triangular Universe U-Shaped Universe 


oO 


0 
a 
2 
a 
4 
5 
6 
7 
8 
9 
0 
1 
2 
3 
4 
§ 
6 
a 
8 
9 
0 
1 
2 
a 
4 
5 
6 
7 
8 
9 
0 
5 
0 





P. R. RIDER 


TABLE IV 


Cumulated Probability of |2| for Samples of 4. 


Probability Prohability 
|z| l2| 


greater | Triangular| Normal greater | Triangular 
than Universe {Universe than Universe 
.0000 : i .1042 


2 
oO 


RB 
2 
3 
4 
os 
6 
J 
8 
9 





SAMPLES FROM NON-NORMAL UNIVERSES 


TABLE V 


Distribution of Means of Samples of 4 trom Triangular Universe 


ac | Probability |e Probability cc fProbabiity 


ac (mean of sample) —(mean of universe) 





P. R. RIDER 


TABLE VI 


Distribution of Means of Samples of 4 from U-Shaped Universe 


0. 

0.75 
1.00 
1.25 
1.50 
1.75 
2.00 
2.25 
2.50 
2.75 
3.00 
3.25 
3.50 
3.75 
4.00 
4.25 
4.50 


2C = (mean of sample) — (mean of universe) 








60 SAMPLES FROM NON-NORMAL UNIVERSES 


TABLE VII 


Probability Corresponding to the Interval xX +35 
Rectangular Universe 


Number of 
Proportion samples for 
of universe which this 
included in proportion 
+ 8s* occurs** 





*i. e. the probability corresponding to X 43s. 


* The probability of the occurrence of this proportion is, of course, obtained 
by dividing by 10000. 








P. R. RIDER 61 


TABLE VIII 
Probability Corresponding to the Interval XtJ3s 
Triangular Universe 








Proportion samples for| Probability 
of universe which this | of occurrence 
included in proportion of this Cumulated 
X+3s proportion { probability 
1/ 55= .018 
2/ 55= .036 
3// 55= .055- 
4/55= 073 
5/55= 091 
6/ 55= .109 
7/ 55= .127 
8/ 55= .145- 
9/ 55= .164 
10/55= .182 
12 / 55= .218 
13/55 = .236 
14/55= .255- 
15/55 = .273 
18 / 55= .327 
19// 55= .345+ 
20// 55= .364 
21 [99 = .382 
22'/55 = .400 
24// 55= .436 
25/55 = .455- 
26/55 = .473 
27/55 = 491 
28/ 55= .509 
30 / 55= .545+ 
33/55 = .600 
34'/55= .618 
35'/55= .636 
36/55 = .655- 
39//55= .709 
40/55= .727 
42/55= .764 
44 /55= .800 
45/55= 818 
497/55= 891 
§2/55= 945+ 
Total 9150625 


‘3. e. the probability corresponding to X *3s 





1 qandI4 
assJaaluy) pedeys-- asIaAIUE) JENsueis | ISIZAIULF) JejNJuL}I9y 


oée?ei9g¢ &© € 2 t 62: tft 9 ¢% € 2 YO 
Qo 


: 


i 
m& 
S 
= 
~» 
x 
= 
1s 
9 
= 
= 
9 
= 
= 
= 
1< 
. 
QR 
= 
H 





P. R. RIDER 


96s 


* L asJdAIUN dy} WOIy p JO Sajdwes 10} 318 S}JOp 94], 
‘MSIZAIUN [CULION & WOI} fF JO Sajdwues IO} SI dAIND 94 
aSIoAIUL) JEINSueII]— ZF jo Ajzyiqeqoig peyejnwny 


¢ AWNSIA 


a6 of 2 r 


co @= oe ~ op *op =m 7 wg © 5 


EEC Tv TTT 


ey 








64 





SAMPLES FROM NON-NORMAL UNIVERSES 


FIGURE 3 
Cumulated Probability of g —U-Shaped Universe 
The curve is for samples of 4 from a normal universe. 


The dots are for samples of 4 from the universe U . 








‘9SJDAIUN IJejNSZueLI} & WIOI} p JO SajdwIeS 103 sJOP sy) 
‘QSJ9AIUN IE[NSuLjII1 & WIOIJ p JO Soduses 10} ase S351 BY 
SEM feassyuy 94) 0} Surpuodsaisoy Aypiqeqosg 


y ANNSIA 
SUOIZBVAIISGO JO BUSH Jaq pezejnuing 
AOS 0 u& of 2S & SF 2 


P. R. RIDER 


i] 
oO 
+ 
bs 
3 

i 
$s 
5 
2 
8 

ot 
5 

e 

S 

F 
Oo 
2 
2 
: 



















AN EMPIRICAL DETERMINATION OF THE 
DISTRIBUTION OF MEANS, STANDARD 
DEVIATIONS AND CORRELATION COEFFIC- 
IENTS DRAWN FROM RECTANGULAR 
POPULATIONS* 


By 


Hitpa Frost DUNLAP 


Territorial Normal and Training School, Honolulu, Hawati 


Formulae for the standard errors of means, standard devia- 
tions and correlation coefficients have been derived on the as- 
sumption of a normal distribution in the sampled population. 
They are said to serve approximately even when the population 
varies considerably from the normal. This paper presents em- 
pirical evidence of their applicability in the case of means and 
standard deviations of samples of ten from a rectangular dis- 
continuous population, and of correlation coefficients of samples 
of fifty-two from a rank distribution. 

The data for the ‘study of the distribution of means and 
standard deviations were secured by throwing ten dice 1600 times. 

The dice were cubes four-tenths of an inch along an edge 
and numbered on opposite faces 1-6, 2-5, 3-4. They were con- 
structed of bone and formed a matched set. 
















*The writer is indebted to Jack W. Dunlap for reading the entire manu- 
script and for checking the mechanical computations. 


H. F. DUNLAP 67 


These were thrown from a cup whose inside diameter was 
1.75 inches and whose depth was 2.5 inches. The dice were 
shaken in a box and then cast upon an especially prepared flat 
topped table covered with eight thicknesses of an army blanket. 

As a guard against any possible bias in the table, the dice 
were thrown alternately with the right and left hands. After 
each throw the number of aces, deuces, treys, fours, fives, and 
sixes were recorded, and the mean and standard deviation cal- 
culated. In this study each throw was taken as a sample of ten 
drawn from a population of 16,000. 

The next step was to determine whether there was any sys- 
tematic bias in the dice used. The a priori expectation for any 
particular face of the die is one-sixth, here one sixth of 16,000, 
or 2,666%. This is of the nature of a point binomial of the form 
(p+q )” with a standard deviation equal to- VN pg 


TABLE I 


Distribution of Observed and Theoretical Populations with a 
Test of the Difference of Their Standard Deviations 


o »(1600.1/6.5/6)# =47.1 s-(7.d7/N )* = 708 
$-0 +237 £13.76 





68 AN EMPIRICAL DETERMINATION 


Table I gives the observed and expected values of each face. 
The standard deviation of the differences was determined and 
compared with the standard deviation of the expected distribu- 
tion and the probable error of this difference was found. 

Small $s is used here to denote a standard deviation of a 
sample, while o represents the standard deviation of the the- 
oretical or true population. The formula for the standard devia- 
tion of a. difference is 


Og" gO; 


and in particular 
Os-a* 


The second term drops out here because it is the standard 
deviation of the true standard error and this is equal to zero. The 
third term drops out for the same reason. Table I shows that 
the difference between the obtained and expected standard devia- 
tions is 23.7 + 13.76. As this is less than twice its probable error, 
it can be concluded that the difference is not significant and that 
there is no significant bias in the dice. 


MEANS 


Figure 1 shows the distribution of the 1600 observed means. 
a normal curve for WV = 1600 is superimposed on the histogram. 
For this distribution 


A y A, ) = 0160 £.0413, indicating symmetry 


ra ( 6 - 3) = -.1050 + .0826, indicating mesokurtosis 


whence we may conclude that the normal curve represents this 





Q, 
3 
5 
Q 
a 
i 


FIGURE 1 
Distribution of 1600 means of samples of ten, with fitted normal curve. 














70 AN EMPIRICAL DETERMINATION 


distribution adequately. 

The curves of this and succeeding figures were drawn 
through points calculated at intervals of 44 0, except that in 
the case of Figures 2 and 3, points beyond + 2 o were calculated 
at intervals of lo. 

The values of the observed means varied from 1.6 to 5.4, a 
range of 6.9129 standard deviations. 

The basic information to be drawn from this study of the 
distribution of 1600 means of samples of ten is given in Table IT. 
The table is interpreted as follows: 

The mean of the sampled population (16,000) is 3.47306, 
while the theoretical mean of the infinite population is 3.500000, 
The standard deviation of the sampled population (16,000) is 
1.6788, and of the theoretical population 1.7078. The standard 
error of the mean of the sampled population is .0133. In com- 
paring the mean of the sampled population with the mean of the 
theoretical infinite population, the former is treated as an ex- 
perimental value whose standard error can be estimated, while 
the latter, being a true value, has no error, 

The standard deviation of the difference between the means 
{1 (theoretical population) and ZX (sampled population) is 


2 2 





2 : 
os = .OISS 


The first and third terms drop out because g, equals zero. 
The difference between the mean of the theoretical population 
and the sampled population is .02694 + .00897, from which it can 
be concluded that the mean tends to vary from the true mean. 

Z will hereafter refer to the mean of a sample of ten. The 
best estimate of the mean of a sample of ten that can be made 
for any sample chosen at random from the sampled population 


TABLE HW 


Distribution of 1600 Means of Samples of 10 | 
———— SSS llaESSESSSSEEE————SE—S— 
Description Observed Value (3) Theoretical Value (//) 
Mean of Sampled Pop............ 3.47306 
o@ of Sampled Pop. .......... y 
Smean Sampled Pop. ....... ; 
Ora-x) of Sampled Pop. ...... 
/7-x of Sampled Pop. ; 
Mean of Means of Samples ‘ 3.47306 or 3.5000 
S. D. of Means of Samples . 5372 or .5401 
S. E. of S. D. of Means 


Q, 
4 
S 
Q 
oa 
x 


A .0000 or .0000 
.0125 *.0065 or .0000 or .0000 
.0096 + .0065 


.0160 * .0413 .00 (normal theory) 
Y, of Distri. of Means 
a - .1050 +.0826 00 (normal theory) 




































72 





AN EMPIRICAL DETERMINATION 





is 3.47306, and from the infinite population, 3.5000. 

The standard deviation of the means of 1600 samples is 
.5467, while the estimated value for a sample picked at random 
from the sampled population is .5372 and from the theoretical 
infinite population .5401. These last two values are calculated 
by the formula 

The best estimate of the standard deviation of a sample of 
ten picked at random from the sampled population is the o 
of the sampled population, 1.6788, or of the theoretical infinite 
population, 1.7078, whence the values in the tables are obtained. 

The standard error of the standard deviation of the means 
of samples is .0097.. The standard error of the standard error 
Oyq Of the mean of a sample of ten from the sampled and theo- 
retical infinite populations is zero, as these are true values. 

The difference between the standard deviation of the means 
and the standard error of such means of samples of ten from 
the sampled population or the theoretical infinite population is 
0125 +.0065. Thus there is no significant difference between 
the value of o,, when calculated by the formula ,,- 4 
and an actual distribution when samples as small as ten are used. 

¥, indicates, as pointed out above, that the distribution is 
not skewed, while ‘¥, shows the distribution to be slightly peaked _ 
but not significantly so. 





STANDARD DEVIATIONS 


Figure 2 shows a histogram and a fitted Gram-Charlier Type 
A curve, of the distribution of 1600 standard deviations of 
samples of ten calculated by the formula 


2 
Xe 
so 


X being measured from the mean, X . 
Figure 3 shows a similar histogram and curve fitted to the 


: 
2 
> 
Q 
a 
z 


FIGURE 2 
Distribution and fitted Gram-Charlier curve of 1600 standard deviations of samples of ten, calculated 
by the formula s+ (74 ZF x2)a 





I-N 
302% LS Bynu110} ay} Aq 
pezejnaye> ‘ua} jo sajdures yo suoneiAap piepue}s OOO] JO ANd JalseyD-weIsy payy pue vONNqiysIG 
¢ AANSIA 


2 
9 
me 
i 
< 
= 
—_ 
= 
Re 
Q 
bus 
hy 
QA 
~] 
Xs 
© 
~~ 
4 
— 
a, 
= 
fy 
= 
x 





H. F. DUNLAP 


TABLE III 


Distribution of 1600 Standard Deviations of Samples of Ten 


Description Observed Value Theoretical Valuc 
2,.Zx* 
52.22 o = NF ft opulation Population 


Z of s’s of sam. 

S. D. of s’s of sam. 

S. D. of Z of $’s of 
samples 

S. D. of 5s of s's of 
samples 


Y, (skewness) 


7% (kurtosis) 





76 AN EMPIRICAL DETERMINATION 


same data when the standard deviations are calculated by the 
formula 


S [= x? 
N-/ 


A study of this latter formula is included here to test which 
is more appropriate when dealing with small samples from a 
rectangular population. 

An interpretation of Table III is now in order. Column one 
is a description of the statistics involved. Column_two is. sub- 
divided into two parts: First, when $ equals VaR , and 
second when § equals a es * Column three gives the theo- 
retical values. There are two of these—one for the sampled 
population and one for the infinite population. In the case of 
the sampled population the values calculated for the standard 
deviation and the OZ become true values when a single sample 
is compared with them in exactly the same manner as if com- 
pared with similar values from the infinite population. The reason 
for this is that for a given sample the 16,000 constitutes the actual 
population from which the sample is drawn. 

In the first line the means of the standard deviations of the 
samples are found to equal respectively, 1.5869 and 2.0403. The 
theoretical means for the sampled and infinite populations are 
respectively 1.6988 and 1.7078. 

In the next line are the standard deviations of standard 
deviations of samples. These are calculated values, obtained by 
substituting in the formula 

c 
% * JEN 
As the best estimate of the standard deviations of any particular 
sample chosen at random is the standard deviation of the sampled 
population, or the infinite population, these values can be sub- 
stituted in the above formula in obtaining the standard error of 
the standard deviation of such a sample of ten. 
The standard error of the mean of standard deviations in 





H. F. DUNLAP 77 


samples for both observed values is given in line three. Obviously 
in the case of the sampled and infinite populations these equal 
zero. It should be clearly understood by the reader that here N 
equals 1600, the number of standard deviations used in deter- 
mining the mean standard deviation. 

Line four gives the standard error of the standard deviation 
of standard deviations of samples of ten. 

Line five gives the difference between each of the true stand- 
ard deviations (sampled and infinite) and the two observed mean 
standard deviations. The standard deviations of the sampled 
population and of the infinite population are each greater than the 
mean standard deviation of the observed population when calcu- 
lated by the formula s/4.2* . In the first case the differ- 
ence is .1119*.0045. This is approximately 25 times its prob- 
able error, so it must be considered a significant difference. The 
difference when compared with the theoretical infinite population 
is .1209+.0045. This is even more significant. When the 
theoretical values are compared with the mean standard deviation 
calculated by the formula s = the differences are found 
to be .3415 +.0042, and .3325 +.0042. The differences here are 
much greater than those found from the first formula. 

Line six shows the difference between the standard errors 
of the standard deviations of the true populations and the cal- 
culated s, of the samples. The difference between o, and 
S, (.3799 ~.2665), is .1134.0032. This difference is approx- 
imately 35 times its probable error. The difference between .3799 
and .2538 is even greater. Still larger differences are found 
when s, is calculated for the s 1 %37 formula. 

%, in the case of both curves is negative and more than 
8 times its probable error, definitely showing a negative skewness, 
Y%> in the case of both curves is 6 times greater than its prob- 
able error, indicating definite leptokurtosis. The Gram-Charlier - 
curves shown in Figures 2 and 3 were fitted to the first four 








78 AN EMPIRICAL DETERMINATION 


moments according to the equation 





Y: tee oF | I-(S3)ox 2) Gir eyateXrs) 


where x= xz 

If we compute values of s by the empirical formula 
s-f555 _-» the mean value is 1.7039, which lies very close to 
the theoretical values 1.6988 and 1.7078, in fact almost exactly 
half-way between them. 


CORRELATION COEFFICIENTS 


The product-moment correlation coefficient varies between 
the limits plus one and minus one. Obviously, the distribution 
of correlation coefficients cannot be normal, although in the case 
where 0 their distribution should approximate a normal 
curve, as it can become symmetrical. Coefficients around any 
other point tend to be distributed asymmetrically. 

It was assumed that if a deck of cards be thoroughly shuffled 
there should be no correlation between successive deals. Using 
a deck of cards gives a sample of 52. A new pack was 
thoroughly shuffled. The cards were then dealt one at a time, 
the first card dealt being recorded as number one, the second 
card dealt as number two, the third card as number three, etc. 
That is, if the seven of hearts was turned first, the value one 
was recorded against its place in the table. After each deal the 
cards were picked up in the same order and shuffled three times 
by the fan method and then cut twice. Sixty such deals were 
made and recorded. Then rank correlations were calculated be- 








Q, 
x 
Ny 
z 
> 
Q 
& 
t 


FIGURE 4 


Distribution of 1770 correlation coefficients of samples of 5 





2, with fitted normal curve. 









80 AN EMPIRICAL DETERMINATION 


tween each pair of deals, the total number of intercorrelations 
being ment) , here 1770. 

In this study, there could be no split ranks. Each card could 
receive one and only one rank on each deal. Thus, the rank 
correlation formula gave exactly the same values as would a 
Pearson product-moment coefficient. 

Figure 4 shows a histogram with a fitted normal curve super- 
imposed on it. % for this curve is .000015+ .0392, indicating 
no skewness, and y, is .2174 + .0785, indicating a slight ten- 
dency to peakedness. Both of these facts are shown by the fit 
of the curve to the histogram. 

The formula for the standard error of a cofrelation coefficient 
from a normal population is 

oe 
p being the correlation in the population. Thus when r= .0000 
and N=52, o,*.1387. 

The mean value of the 1770 coefficients is r--.0012. The 
expected mean is zero. The difference between these two values 
is 0012 +.0022. This shows that the mean correlation coefficient 
is not significantly different from the expected mean correlation. 

The standard deviation of the observed distribution is .1359. 
This value differs from the expected value by .0028 * .0091. The 
formula 4, -! -0* is therefore seen to give a sufficiently close 
approximation in this case. 


























/-p* 


CONCLUSIONS 


1. The distribution of means of samples of ten drawn from 
a discontinuous rectangular population is normal. The formula 


og* Wt gives a reasonably close estimate of the standard 
error of such means. 


2. The distribution of standard deviations of samples of 





H. F. DUNLAP 81 


ten drawn from a discontinuous rectangular population is skewed 
and leptokurtic. The formula og, = Sor does not give a rea- 
sonably close estimate of the standard deviation of standard 


deviations of samples of ten, whether the pis are computed 
from the formula s fix? or 


3. Neither of the formulas, 2 Ext maa sj/Ex* ; 
for the standard deviation of a sample of ten gives a ante 
close estimate of the true standard deviation in a rectangular dis- 
continuous population. The empirical formula s iz 
does appear to do so. 

4. The distribution of correlation coefficients of samples 
of 52 from a rank population in which the expected correlation 
is zero, is symmetrical and very slightly leptokurtic. The formula 
o,= f-p* represents adequately the standard deviation of 
such correlation coefficients. 


az 
N-.25 


Whha F Manag 






















EDITORIAL 


The Interdependence of Sampling and Frequency 






Distribution Theory - 





The object of the theory of sampling is to describe the phe- 
nomena exhibited by all the samples that can possibly arise from 
a parent population of known characteristics. In some cases the 
desired description can be obtained directly by employing elemen- 
tary operations of combination theory, in others it is either ex- 
pedient or necessary to use the indirect attack of the statistical 
theory of sampling. These two meihods are quite different in 
application, and it is advisable to illustrate the respective peculi- 
arities of the two methods. 

Example 1. An quction bridge hand may be regarded as a 
single sample withdrawn from a parent population of 52 cards. 
The number of different hands that can be selected equals the 
ttumber of combinations of 52 things taken 13 at a time, namely, 
( 7? ) = 635 013 559 600. Of these 


CR ence ees F(z)= (3° *) 





will contain exactly 7 cards of any specified suit. Therefore if 
in this expression we successively place # equal to 0, 1, 2,... 13 
we shall obtain the frequency of all possible samples ranked ac- 
cording to the number of cards of the specified suit contained in 
each sample. The results are presented in the following table. 


EDITORIAL 


TABLE I 


P,= f (z)/N 


8 122 425 444 
50 840 366 668 
130 732 371 432 
181 823 183 256 
151 519 319 380 
79 181 063 676 
26 393 687 892 
5 598 661 068 
740 999 259 

58 809 465 

2 613 754 

57 798 

507 

1 


635 013 559 600 


2 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9g 
10 
11 


— 
Ww bd 


In this illustration, combination theory has yielded a perfect 
solution. The frquencies are exact, and the sum of the fre- 
quencies between any two limits may likewise be obtained exactly 
by a simple addition. 

Example 2. The bidding strength of hands in auction bridge 
is often approximated by counting each Jack, Queen, King and 
Ace as 1, 2, 3 and 4 points, respectively. The total count of a 
single hand may range, therefore from 0 to 37 inclusive. Re- 
quired the frequency distribution of all possible hands when they 





34 EDITORIAL 


are classified according to count. 

Unlike the preceding problem, we cannot obtain a simple 
expression for the general term, “, , of the required distribution. 
But after rather involved computations the following solution 
may be obtained: 


1 
2 
3 
4 
5 
6 
7 
8 
9 


— ps 
—_— © 


TABLE II 


2 310 789 600 

5 006 710 800 
8 611 542 576 
15 636 342 960 
24 419 055 136 
32 933 031 040 
41 619 399 184 
50 979 441 968 
56 466 608 128 
59 413 313 872 
59 723 754 816 
56 799 933 520 
50 971 682 080 
43 906 944 752 
36 153 374 224 
28 090 962 724 
21 024 781 756 
14 997 080 848 
10 192 504 020 


sso 
6 579 838 440 
4 086 538 404 
2 399 507 844 
333 800 036 
710 603 628 
354 993 864 
167 819 892 
74 095 248 
31 157 940 
11 790 760 
4 236 588 
1 396 068 
388 196 
109 156 
22 360 
4 484 
624 
60 
4 


635 013 559 600 





EDITORIAL 85 


Example 3. If the mean and the standard deviation of the 
weights of a group of 200,000 men be 140 lbs. and 20 Ibs., re- 
spectively, and if in addition it be known that the higher standard 
moments of this distribution be 


“3: =.9 5.47 443 


Ma sae 317 6.571797, 


what is the chance that the mean weight of 1000 men chosen at 
random from the 200,000 will exceed 141 pounds? 

It is clear that it would be physically impossible to solve this 
problem by employing a direct attack by combination theory, even 
though the weights of each of the 200,000 men were available. 
Moreover, it is likewise evident that in statistical problems cor- 
responding to the illustrations of examples 1 and 2, the number 
of individuals in both the parent population and each sample is 
considerably larger than 352 and 13 respectively, ana consequently 


the calculation of either a single frequency or the sum of any 
large group of consecutive frequencies by the direct method is 
quite out of the question. 

Let us now consider the three examples above from the point 
of view of the indirect attack. The parent populations for the 
first two examples may be interpreted as 


Variates . . . x 0 1 
Frequencies... f(x) 39 13 


and 


Variates x .. 0 
Frequencies . . Fxg. . % 


respectively. 





EDITORIAL 


For the first, the mean is at ac = 1/4, and the moments 
about the mean of the parent population are obviously 


Mya Gm [9%3¢0" | 


For the second, the mean is at 2 = 10/13, and corres- 
pondingly the moments of this parent population are 


~ 


! [¢- ” n mn 7M "| 
Paixe™ 73” 10) +3 +16 +29 +42 


If s and r denote the number of individuals in the parent 
population and each sample respectively, then the moments of the 
distribution of all samples that can arise from this parent popula- 
tion may be obtained from those of the parent population by 
means of the relations 


Mg*r-Mxz 

42:20 ° Yo: SPP) 

/43:2° 4s: SP. FPa* 2 Ps 
4:27 Ha.’ SUP Tet "rg fe) * 5. 5 (Pa Pry * Py) 


"(2)2 Ps:2° Ps: O(Pi- Pe +59 Pg  O9f, * 245) 
410 fh 5.5 Mes ic’ 9" (Pe- FP +54 -2p5) 

A‘, = 8lp.- Mr, + /80p, - 390 2, + 360p, -120p,) 
+ AT bly. My. S* (Pe- OP, + 19fy -18P, + SP) 
+10 pix ; s*(p,-6p,+ Sp, 12 p, +4p,) 


* 1S p6,),° SU(Ps- Fa * SPs Pe) 





EDITORIAL 


o eo rnd) --: to é¢ factors 
i; S(S-IWs-2):°°- to i factors 


Since the moments 4,,., for each of these three examples 
are now known, and according to the conditions of the problems 
the values of (7°, S) are (13, 52), (13, 52), and (1000, 200000) 
respectively, it follows that the moments of the desired distribu- 
tions of samples are as follows: 


659113600 288/17 
§3591421/5331200 17441114/29155 
9339447/1066240 
71781968037/801812480 | 2684384074/39151 


It will be observed that the indirect procedure has yielded 
the moments of the required distributions rather than their fre- 
quency functions, and the next step therefore is to obtain with the 
aid of these moments approximate expressions for the desired 
frequency functions. In this connection it should be borne in 
mind that we are not concerned with questions regarding the 
probable errors of the moments which we are employing, since 
the moments cornputed for the distributions of samples are neces- 
sarily exact, and their probable errors are therefore zero. For 


1See Annals, Vol. I, page 104. 






























EDITORIAL 





this reason arguments tending to limit the number of terms that 
may be employed in either a Gram-Charlier series, or in the de- 
nominator of Pearson’s differential equation are not to the point 
so far as our illustrations are concerned. These remarks hold 
even for the third example, since if the moments of the parent 
population are as given, then the moments of the distribution of 
samples may be determined with any desired degree of accuracy. : 

Since it is evident that the solution of our problems now 
depends upon our obtaining approximate expressions for these 
‘distributions whose moments are known, we shall at this point 
develop a general method of representing discrete distributions 
which is essentially due to the researches of Charlier. Although 
the results that we shall obtain are practically those that have 
also been obtained by Gram, Edgeworth and others, the method 
that we shall employ is that used by Charlier in “Die Strenge 
Form des Bernoullischen Theorems.” 

Let f(ac) be the frequency function for a discrete dis- 
tribution ranging from o=% to x«-% . If the ordinates 
be equidistant at intervals of A , the total frequency of the dis- 
tribution is 







(3) N= F(L)+H poh) +--+ feash)oHx)sbla,oh)o-+ T(t) 
Z, F(x). 


where our interest is focused on a typical ordinate at a2-uc, . 
If we now set up the function 


& ° . | 
Z, fezye™™ fps f (1,0h)- err) a ZJe _ 


(teh )wi’ 


+ fix,-h)e Gis Le wae 





EDITORIAL a 


where ¢=/-/ , and multiply each side by e “X44 so that 


* if we we é 
e€ —s fiz)e~ s flaJd+ FGx<,+h)-e ” se Rup"... 


flee". hie td-e '* llaeaves. teed -_ 


we obtain by integrating both members with respect to w be- 


tween the limits «= ¥ and w= F 


_ | z 
Je RAT {Z, tere” _— l fer.) aie 


since the integral of all other terms of the right hand member will 
vanish as follows: 


i Vaemh).e da = (z,+ mh) - 
t 
J | cos mhwrt sin mhres| aws 0 


( m is an integer.) 
It follows therefore that 


(4) Flx- 2 [pe ae z fex)-e Ot ctw 


xd 










EDITORIAL 


Moreover, since 
























—_ iene -Cbth)ai -awe 
- + - a 
€ ee +--+e wa © 


e ~hwe_ / 


we see that the sum of all the consecutive frequencies from 
oa=@7a to a-=-b _ may be expressed as the definite integral 


no 


r ine -awet 


(5) Z ftx)-£] 7* — {2 fine ‘de 


7 ar 
A 









The changing of the order of integration is permitted since the 
limits are all finite. 

Ordinarily frequency distributions are expressed as develop- 
ments of the integral (4), and the sums of consecutive frequen- 
cies obtained by applying the Euler-Maclaurin Sum-Formula to 
these results. It seems at first sight that it might be well to place 
a little more emphasis upon the evaluation of (5), since this as 
it stands affords an exact expression for the sum of any group 
of consecutive frequencies. For the case of continuous variates 
we need only permit 4 to approach zero, replace the sign of 
summation by the sign of integration, etc., and after justifying 
the change in the order, of integration for the resulting infinite 
limits obtain 


©) | Fearte-f, oe {[1 fiae™ “ate baw 


EDITORIAL 91 


We shall now attempt to evaluate the definite integral (4). 
Let us first observe that the quantity within the parenthesis is a 
function of w , since the finite integration with respect to x 
and the subsequent replacing of wc by the limits will cause this 
distribution variable to disappear. 

For reasons which will develop later, let us write 


t, . ‘i (2? & (wi)* 
2 é)+6 (ax) Cx-6, eat - ao; 
é fixes é we Z, fixe a . 
X= Kee, 


If in Leibnitz’ formula 


Deveu-DV 7) Da-D ve(D Du Pron: 


oe" az 
we place uve and v-e , and note that 
oF 
=O 
2-0 


(2n)! 6” 


2" pn! 


then 


5 D 
nour & an? we we ng 2 nO nes 
(7) De wats Be ote Soy 5+2%3!2 b+: 


ni. n(n-ln-2) ~ 2 + . . to § factors. 





92 EDITORIAL 


Thus we may write 


(x) ai-b, fu" 
F fas ewe, +6, (widec, B's cy a | 
xf, 


and employing the notation 


A 
2, (2-6) fexd= Ny, 


we obtain from (7) 


cel 


c, fs,’ 
ca 2 ft, 4, 
Cy*aty- 3b, ys 


Ca" pty Sb,u, +3 bE 


c nA, ~ nw b, pt ayo; 6 Mang ae bo +: 


Formula (4) may therefore be written, dropping the sub- 
script on x, 


£ 
h 


fex-n Zp fe ag 


| 1*¢,(wWi)+ Cy @i"s o | aa 





EDITORIAL 


Placing 


7 


; 
(10) Ox)= if, 


A 


@ 
~(x-b) wi - 22 


e dw 


it follows that the 1 th derivative with respect to a is 


F = 
(1) Oot, [roi e “EMO TE gu ; 
- 


so finally 


(2) fiz)-W-h [OG)- % Ota § o't-Yotes-| 


Let us now investigate the function O(2c), 


gs 
Om) -£, I e “costa - a 
“a 


-isinlae ~-b,)@ ] dw 


— -c8 
= 4, e ’ “<208 (= -6,) wdw 
















94 EDITORIAL 





[ since e “baw Za _— -6,)a9 is an odd function of w | 
tf" os(x -b,)w dw 
sell 
~% i. © 2 Ps -b,)wdw 
* 
4 -(2c- 6)? ~ -4, wY2 
: / 
bb, © =f € c - aw 
rb, 2 at, 0s(x-b,)w 
= P(x)-2,. m 


Likewise we may write 





- 4,0 Ye 
tn) (n) @ 
Ci) -¢G )- &,. e,<4¢/, we aw 
a 
By successive integration by parts it can be shown that 
£ e 
[xe Tax. -e. {x4 (na? 


(13) + (mifn-Gx”*r- --+(n-Yn-3)-- (n-2i03)2 1 Le, 






&; * (m-I{n-3) --- (29- -2601) fix oy T+. 








EDITORIAL 


so we have that 


rye 
(14) B,<% ¢ By". — ME [ye ae (fy ote 9 ts. 


2 


So far we have said nothing concerning the values of the 
parameters 6 and 6, . Referring to formula (8) it is seen 
that if the origin of ac be taken at the mean of the distribution 
in question, and 5, equal the second moment about the mean 
of this distribution, c, = ¢,=0, and consequently if the 
values of &,, may be neglected, the equation of the distribution 
expressed in standard units becomes 


(is) f(x)-nz {¢c-4 As oy» At $%- 4600+} 


? 
where t. 7» af . d0)- ze * - 
2 


Ax us 


A, ” 4-5 


(16) 


As =%,-40 wy 


Ae os “15% 430 


6) 


n@ _a” a 
me gt.3! “7-6 


ce +2 aw 
An*%y7 27! “ne* 2% oe! 


Peoees 





EDITORIAL 


By employing the Euler-Maclaurin Sum-Formula we can 


write 


Ff (a)+ lath) + lrar2h)e--- +1 (b-h) +f (b) 
(17) | 
b+h4-09. 


= [eerat-A;$ty+A, $60-A, $0) + AGO _ pot 
: —— 


mn 


". 453,42. 
24 og 


“% h* 
aa 
3 42 72z0a* 


Wty lay  Mg-3F hh? we 
A, We +s “a6 tet’ 


A %s-l0us h*® ~-3, he 
oe 


é e 
A a Ge -/F we +30 + E » 
240. o* 288 + «=302400° 


s 720 


In some cases it may be more convenient to employ a mean 
and a standard deviation of the generating function that differs 
somewhat from. that of the distribution for which the representa- 
tion is desired. In this event the coefficients of the first and second 
derivatives in (15) will not vanish. However, the extra effort 





EDITORIAL 97 


expended in increasing the number of significant terms may be 
more than offset by the fact that a rather arbitrary choice in the 
valuesof 4, and 4, may result in simpler values for 


ac- b, 
(4 


t= 


which in turn may occasionally eliminate difficult interpolations 
when dealing with tabulations of the generating function and its 
derivatives. 

Formulae (17) and (18) may be regarded as a sort of apol- 
ogy for the fact that the definite integral of formula (5) has 
never been developed. The need of a satisfactory expression for 
the sum of any number of consecutive variates is indeed acute. 

By permitting 4 in the foregoing theory to approach zero, 
one can obtain corresponding formulae for the ordinates and 
areas of distributions of continuous variates. However, it should 
be noted that for this case the limits for the. integrals in the 
vicinity of formula (4) are now 


fim 2, 
h-0 fh 


and consequently the changing of the order of integration must 
be justified. | 

In conclusion we may state: 

I. Answers to problems of statistical sampling are usually 
expressed as finite or infinitesimal integrals under a function 
whose moments only are known. If known, the function is gen- 
erally of but little value. 

II. It is necessary to approximate the desired integrals by 
- employing frequency functions. 





EDITORIAL 


III. Present methods are unsatisfactory from the point of 
view that remainder or limit of error terms are not available. 


The X “test, though helpful, does not meet the issue in question. 





NOTE ON THE DISTRIBUTION OF MEANS 
OF SAMPLES OF NY DRAWN FROM 
A TYPE A POPULATION 


By 


Cecrt C, Craic 
National Research Fellow 


Recently in this journal, Dr. George A. Baker has found 
“the distribution of the means of samples drawn at random from 
a population represented by a Gram-Charlier series.”* It is the 
purpose of this note to call attention to the fact that by the use 
of the semi-invariant notation Dr. Baker’s results may be reached 
in very many fewer steps. 

Let the parent population be represented by 


(1) Flee)= Glce) [+ $3 Hs (B+ Se H4(F,) 


4 
- x 
Bsc a rey Hy (= 
ox «(&) 


in which 
xe 
- Ge 


(2) PD lac) = SPF e 


1Vol. 1, No. 3 (Aug., 1930), pp. 199-204. 





100 NOTE ON THE DISTRIBUTION OF MEANS 


the origin for oc being chosen at the mean, and 
a 2 _t? 
(3) H,(t)e 2 -o*(e *), 


We shall first find the distribution function of 2 ~.x,+x,+..+z,, 
in which xj, te 42,::-N , has the frequency function 
f (x). Let us assume the frequency function of 2 is given by 


F (z)= D(z) [+ Ag 3H; (2) Bs 4, (&) 


(4) 
+ a(2)), 


Then the semi-invariants of F (x), A,, AZ,--:- A, 
are defined by the formal identity in ¢ : 


(5) Art ebrst’s $, A, ts = fet Flxer A, =0 in this case) 


and on integration, using (3), we get at once on the right: 


At 5 4 k 
e | -a,t +a,t + CO, #4] 


Similarly for the semi-invariants L,, L,,L,°-- of F (z) 
we have 


és 2,4, Fr F é 
(6) ~~ bat *+§,L5t we pagtied,tt 20 4,4. 


But because of the well-known fact that L, =N X, this 
gives 





C. C. CRAIG 


[-Agt®+Agt?- ----C*A, 4 


N 
. [/-a,t?+a,e*- --- -/)“a,t ra 


an identity in Z. Thus 


N/ 


. Vs Uy 
(7) A, 2 Vy! Uy fs ULE ON-V RY + V2! a, ™** 


the summation including all terms for which 
SV, +dve-.- gh or 


Remembering that o,-/L, =/N co, , we have on 
substitution in (4) the expression for F (z) since only a finite 
number of A, ’s (depending on /V) are different from zero. 

To get the distribution of g’» *%-*+%e*---* %w only 
involves the appropriate change of unit. 


e 
—~ 


Brana ee. 
Stanford University. 





